[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


New anti-spam measures have been applied to all boards.

Please see the Frequently Asked Questions page for details.

[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103019207 & >>103008519

►News
>(10/25) GLM-4-Voice: End-to-end speech and text model based on GLM-4-9B: https://hf.co/THUDM/glm-4-voice-9b
>(10/24) Aya Expanse released with 23 supported languages: https://hf.co/CohereForAI/aya-expanse-32b
>(10/22) genmoai-smol allows video inference on 24 GB RAM: https://github.com/victorchall/genmoai-smol
>(10/22) Mochi-1: 10B Asymmetric Diffusion Transformer text-to-video model: https://hf.co/genmo/mochi-1-preview
>(10/22) Pangea: Open-source multilingual multimodal LLM supporting 39 languages: https://neulab.github.io/Pangea

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>103019207

--Paper: Variational inference for pile-up removal at hadron colliders with diffusion models:
>103022524 >103028691
--Paper: MDM: A diffusion-based approach for complex reasoning and planning:
>103023314 >103023420 >103023535 >103023554
--Paper: AI models and their reflection of creators' ideologies:
>103026352 >103026443 >103026473 >103026530 >103026538 >103028395
--Papers:
>103022742 >103022846
--INTELLECT-1 project discussion and dataset composition:
>103020360 >103020436 >103020446 >103020454 >103020473 >103020565 >103020682 >103020704 >103020505 >103020589
--Synthetic datasets and training data for language models:
>103025196 >103026728 >103026750 >103026812 >103026894 >103026965 >103027294
--OSI declares AI models must disclose training data to be open source:
>103022896 >103023019 >103023127 >103028442 >103028495 >103028608 >103028650 >103028776 >103023316 >103028277
--Discussion of a new AI companion project using llama.cpp:
>103020193 >103020211 >103020299 >103020351 >103020406 >103020485
--gpt-sovits setup and voice cloning experience:
>103019637 >103020547 >103020758 >103023131 >103023143 >103024356
--MaskGCT open source TTS model announcement:
>103027292 >103028638
--MacBook Pro M4 Max specifications and performance discussion:
>103027383 >103027421 >103027507 >103027516 >103027552 >103027642 >103027851 >103027967 >103028014 >103028061
--Layer Skip release and finetuning requirements:
>103023273 >103023513
--Google DeepMind's research on Recursive Transformers:
>103023403
--Discussion on neuron steering and explanation in AI models:
>103026666 >103026676 >103026829 >103026872
--Miku (free space):
>103020019 >103020069 >103020083 >103020578 >103020750 >103020843 >103021618 >103021933 >103023149 >103024265 >103025233 >103027706 >103027863

►Recent Highlight Posts from the Previous Thread: >>103019213

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
>>103029775
Yes, exactly.
It basically spits out in third person and I roleplay in first person.
That somehow works. Hell, it worked with llama 38b fine tunes too, but Nemo is just that much better.
>>
File: 1719267688048363.png (39 KB, 574x359)
39 KB
39 KB PNG
>>103029905
If I buy a new M4 laptop with 128GB memory, what can I run?
>>
>>103030007
You need 2 to run 405b
>>
>>103030007
Mistral large at Q6?
How does a laptop have enough cooling for that?
>>
>>103030007
Nothing.
>>
>>103030038
>You need 2 to run 405b
and that's super retarded. I got RPC working between a pair of servers with 2xA40 and the perf wasn't nearly as good as I'd hoped (and managing big context was either too tricky for me or outright broken when crossing card boundaries)
>>
>>103030007
depends on what t/s you consider acceptable
>>
Mikulove
>>
miku flaying
>>
>>103030007
good to keep in mind that even the biggest, baddest apple silicon cpu can only manage half the prompt processing speed of a 3090 under ideal circumstances, so ignoring that part of the equation will lead to tears later
>>
High impact Mikuviolence
>>
>llama4 coming out in a few months
we are so back.assistant
>>
>>103030443
Thanks for reassuring me that it is not all refugees.
>>
>>103030405
https://www.youtube.com/watch?v=9oRnVn4aqpM
>>
>>103030443
Define "few months".
>>
hi sirs please to kindly suggest the model to helpful do uncensored fast and no money thanks you sers
>>
>no news in 5 days
dead hobby, closed source gigachads won
>>
>>103030607
>dead hobby, closed source gigachads won
Newsflash: wearing IoT cock cage is not peak masculinity.
>>
I need someone (female) to put a cock cage on me...
>>
Open source lost
Local lost
Many must —ack
>>
>>103030747
Dear transgender,

Unlike you, I am not obsessed with software to the point of committing suicide.

Hope it helps!

-Straight White Man
>>
>>103030747
Six more months until AGPL rugpull.
Trust the plan.
>>
>>103030747
repeat after me: "Just because other people have gotten better stuff does not make my stuff worse. Its still the same as it ever was". Don't be a consoooomer pleb retard
Also, local open weight models are kickass right now so I don't even know what you're trying to say
corposhit is better in a few niches, but the ability to fully control local means that ten thousand other avenues open up that are functionally impossible with closed models
>>
>>103030917
>Six more months until AGPL rugpull.
niggerganov is mitcuck he won't do shit
>>
Is 7900xtx a good buy? Or two 7900gres? Or, is 4080s vastly better than xtx?
>>
>>103031033
AMD
>>
>>103030747
They showcased it because it is supporting Apple's new efforts with MLX and I don't know of any other project that is.
https://github.com/lmstudio-ai/mlx-engine
Makes sense on top of it being pretty but it is hilarious Apple thinks their laptops are in any way advantageous to running the models they showed when a cheaper laptop with a 4080 with CUDA would crush it in ML. I will give credit to Apple though with other workloads like Blender, the M3 max is a bit above the 4070 in rendering and an M4 will probably equal or exceed that in laptop. Still will be shit at games though.
>>
>>103031033
>AMD
>>
>>103031033
Get a 3090 or 4090 or 5090.
>>
>>103031125
>>103031115
>>103031064
Loonix so amd is preferable. So which one. Can I run 70b model with XTX?
>>
>>103031150
3090
>>
>>103031150
You can probably run 70B at a really low bpw, yeah.
>>
We see lots of tables of benchmarks.
But what are their settings?
We have so many dials to turn: Temperature, Repetition Penalty, Top P, Top K; Min P and Top A and other stuff.
Changing them sure can change the quality of the output.
Prompt format matters too. Some models take L3 and Mistral and CommandR, others either get the one they want or they cough up nonsense.
What are the true/default/canonical/deterministic/correct settings?
>>
>>103031160
I can't get used around here.
>>
>>103031179
Greedy. The only way to judge a model
>>
>>103031197
ebay doesn't ship to your region? There are endless $500-ish 3090s on there
>>
>>103031237
greedy search, i.e., always picking up the token of the largest probability as the next token.
It will always tell you how the model will naturally perform, which will be its area of least resistance and greatest potential
>>
>>103031338
You misclicked.
>>
>>103030946
>local open weight models are kickass right now so I don't even know what you're trying to say
It is probably a caigger saying that. Is cai actually that bad?
>>
File: MikuHalloweenSpecial.png (1.23 MB, 832x1216)
1.23 MB
1.23 MB PNG
Halloween migu
>>
>>103031385
You made me refresh my page and there is not a single dancing on my 4chan, it's not Halloween yet till that happens.
>>
>>103031390
It's Halloween Eve.
>>
>>103031179
been my experience lately
trying various nemo models, lots of story writer tunes built on the base - but all of them have been legitimate retard tier excepting top-p somewhere around 0.5. I hear talk of low Temperature (0.2-0.5) which also seems to help, but dialing top-p above 0.5 (or even much below) and forget article time frequent conjunction yes.

Maybe some of these models are great, but without any insight on what parameters were used when benchmarking - not the least of which is prompt format - there's no way of guessing how close any of your outputs will remotely reflect the potential of that model.

This seems more of a problem now than it has been historically. Used to be I could throw koboldai's default settings at everything and get mostly coherent outputs. These days it seems only Llama 3.x models tolerate those settings, and finetunes have their own preferences.

All that said, wasn't GGUF format supposed to include all sorts of metadata to help mitigate this bullshit, or did I misunderstand what meta data the file format is supposed to include? I wasn't really paying attention at the time, because turning the dials wasn't that big of a deal until the last 6 months or so.
>>
>reading layerskip's paper to replicate their training code, because they do not provide any
>inconsistencies between "applying layer dropoff" and "we actually compute the loss across outputs from all layers to make the final loss"
>mentions a cirriculum function, but only mentions what its supposed to do
>piecewise function with "i" variable but doesnt define what it entails
>broken latex when defining "hyperparameter" constants
Agony.
>>
Caiggers need not answer. When was the last time you loaded up some kind of finetune and felt a genuine improvement in cooming quality? When was the last time you loaded up a new base model or instruct and this happened? And finally when has that feeling persisted after initial honeymoon phase?
>>
>>103031390
its been halloween in japan for over 6 hours already
>>
>>103031237
>>103031338
Kobold isn't showing me a setting for that.

>>103031417
>This seems more of a problem now than it has been historically
I've been kinda all over.
Temperature: I figured lower would be more stable, but 0 isn't allowed by Kobold. The slider stops at 0.10 and if I put in 0 it goes to 0.01. Is there a divide-by-Temp in the math making 0 or <1e-2 a problem?
Rep penalty slider stops at 1 but if I enter 0 it goes to 0.10; I wonder again if it's being divided by, and if it can go so low why isn't the slider going down there? And does it even matter? If it makes the model use synonyms for slop it's still slop, and penalty might screw up Q&A where a specific term of art might need to be used many times.
Top-P and Top-K, not sure, some models seem not to care, others go gibberish if I screw with them.

>>103031499
It's not spooky till sunset. And then it's over 7 hours later at midnight.
What a rip off. Best holiday, worst schedule.
>>
>>103031453
Finetune: Bondburger or Fish, later Sorcerer 8x22b. All major improvements from the base model and avoid the positivity bias/alignment issues.
Base Model: Nemo 12b (fast, output surprised me for its size, much less aligned than the 8b llama models and smarter)
Past honeymoon phase: Sorcerer 8x22b. Still my daily driver for non-productivity tasks like RP. Faster than 123b but still maintains much of the coherence
>>
>>103031448
Here's some code from 2023 that might help
>https://github.com/ggerganov/llama.cpp/pull/3565
>>
>>103031417
>low Temperature
Mistral Nemo is supposed to be used at low temp. It's on their readme. You only download pre-converted models, don't you?
>top-p
You mean top-k prob. top-p is a different thing.
>insight on what parameters
None of the samplers change the order of the tokens unless temperature equalizes the top-k N tokens. Temperature is never that high. Only top token is taken, no need for other samplers.
>not the least of which is prompt format
For instruct models, the model's prompt template is used.
>could throw koboldai's default settings at everything
Most models can follow
charA: dialog
charB: dialog

type of output just fine. It's how base models were used for dialog back then, and how they can still be used right now, be them instruct or not. Guess what one of the defaults on koboldai is...
>finetunes have their own preferences
No shit.
>All that said, wasn't GGUF format supposed to include all sorts of metadata to help mitigate this bullshit
They don't yet have a jinja parser. The point is for the user to have the data available. Programs that can parse jinja and load gguf files have the data available, but llama.cpp doesn't force you to use them. You can try to use whatever you want. The idea is to have a self-contained file for everything else.
>because turning the dials wasn't that big of a deal until the last 6 months or so.
Few of them are worth it. There's only a few to play with if you understand what they do.
>>
>>103031552
>What a rip off. Best holiday, worst schedule.
Sucks that it landed in the middle of the week this year. It's going to come and go so quick.
>>
>>103031552
>Kobold isn't showing me a setting for that.
Use https://artefact2.github.io/llm-sampling/ to figure out a setting that gives you only one token choice at all times. That's null sampling, which equates to what greedy would do.
>>
>>103031552
>Doesn't understand greedy
Top-k 1, disable everything else.

If your model needs repetition penalty, it's shit. Change it.
Temperature 1 is the 'normal' token distribution. Most models recommend about 0.8. Mistral nemo at about 0.3. 0 doesn't make sense.
Rep penalty is multiplicative. 1.01 increases the penalty, 0.99 decreases it. Not worth using on a good model.
Top-p is deprecated. Top-k or min-p and temperature will do 99% of what you need.
>>
svelk
>>
>>103031325
Nein. What about two 7900gres more vram is better?
>>
>>103031573
I miss KerfuffleV2. He was always super humble. I hope he's doing ok.
>>
>>103031453
>finetune
my meme merge :^)
>base/instruct
mistral large
>>
No new sota model since largestal.
Its never been more over
>>
>>103031448
https://github.com/facebookresearch/LayerSkip
>>
What happened to speculative decoding and lookahead decoding? Did ggerganov abandon it?
>>
Holy shit bros, I created a super low-effort assistant and its acting like an anti-adhd/laziness/procrastination bot. Extra executive function in a bottle.
Context:
You are an assistant that will help in the work the user tells you about below. You will help by answering in short sentences. You will NOT provide long responses, lists or bullet points. You can ask or answer questions, but will not infodump.
Is summary, this conversation should be back and forth with the user. Be sure not to do the work but only to assist.
Greeting:
Hi, what are we working on today?

Basically reverse-CoT, making the human think but keeping things on the rails and giving useful advice/asking the next most useful question. If you get stuck, you can just ask it a question or for suggestions and it won't yap too much. Just enough to get you back on track.
I found using a standard assistant would dump out reams of relevant stuff, but then I'd procrastinate taking that and turning it into anything useful, whereas this way feels good, man.
>>
https://www.youtube.com/watch?v=HaAIsyP4JPc

https://www.nextsilicon.com/
>>
>>103031885
Buy an a- oh wait, you don't have a product!
>>
>>103031875
Both exist as examples when using the llama.cpp C/C++ API but are not available in the HTTP server.
I have mostly worked on lookahead decoding, the problem with it is that it just does not give a speedup that is very large or consistent and that existing speedups diminish as the vocabulary size increases.
llama.cpp training is on track for the end of 2024, one of the things that I plan to try with it is distillation of models for use with speculative decoding.
>>
>>103031921
>distillation of models for use with speculative decoding.
is the idea that you'd use the smaller, distilled model for high-confidence tokens and hit the big model when there's more ambigousness?
>>
>>103031631
So I can just put 1 in all four of the main sampler fields in Kobold settings and let it roll?

Why isn't that default/preset?
>>
>>103031958
No, the distilled model is used to draft tokens one at a time and the big model is then used to validate the drafted tokens all at once.
This is potentially faster because the runtime increases less than linearly with the number of tokens per batch (this is also why prompt processing is so much faster than generating tokens).
>>
>>103031921
>I have mostly worked on lookahead decoding, the problem with it is that it just does not give a speedup that is very large or consistent and that existing speedups diminish as the vocabulary size increases.
5% speedup is a 5% speedup, I'd take it, even if it's not applicable everywhere. Maybe for large vocab you can add some kind of user-defined filter to for example exclude all non-latin tokens?

>llama.cpp training is on track for the end of 2024, one of the things that I plan to try with it is distillation of models for use with speculative decoding.
Nice to hear that it isn't abandoned. It would be nice to be able to distill 100B model into 10B.
>>
>>103032003
>5% speedup is a 5% speedup, I'd take it, even if it's not applicable everywhere.
Give me a cloning machine and I'll do it.
As it is I need to prioritize what I work on and this simply didn't make the cut.
>>
>>103031994
I don't use kobold, but if greedy sampling is what you want, that's typically the way to do it.
>Why isn't that default/preset?
People like their samplers. If you see overly-specific sampler settings (as in "exactly 1.0236475 is good for rep pen, but 1.0236476 is not") be suspicious. Same for all samplers.
If you want variety (what some call "creativity"), however, higher temperature and higher top-k (or lower min-p) help. You could also use DRY and/or XTC, but i haven't tried them.
>>
>>103032102
Yah, samplers tend to solve specific problems and their misuse probably causes all sorts of problems for llm users. Its too bad they aren't named in some way that's easy to intuit for the average human.
e.g These will be highly model and situation specific, but here's my juuuust coherent deepseek preset for starting out a card that needs to seed a bunch of "random" values to get going (and then get dialed back down)
temperature: 2.6
min_p: 0.0065
top_k: 200
>>
anyone merge the new booba? anything broken this time?
>>
hi guys, I tried to get local models running a while back on only one (1) consoomer card (3070, 8gb of vram). I could get a 7B model going pretty fast, and an anon told me you can better models going with offload but I can't figure out how to get it to work. Is it supported on textgen-webui? do I need to use kobold?
>>
>>103032389
>Is it supported on textgen-webui?
Using llama.cpp yes.

>do I need to use kobold?
You don't need, but it's less of a headache if you are going to use ggufs anyway.
Also, download rocinante v1.1 Q5_K_S or whatever. You should be able to run that with a little over 8k context with most layers in VRAM.
>>
>>103032389
You need to use the GGUF format to offload to CPU.
Ooba (textgen-webui) should have it.
If you find GGUFs made with imatrix files (e.g. iMat) you can go smaller with the quants without losing as much coherence.
With 8gb VRAM you could probably run a Q3_K_S quant of Mistral-Nemo-Instruct-12B and still have a bit of VRAM left over for context
>>
File: 1725991413507490.png (51 KB, 858x112)
51 KB
51 KB PNG
>>103031879
It's pretty fun. Got this gem after a few dozen one liners and absolutely no lewd stuff.
>>
>>103032389
What's your system RAM like? If you're going larger than VRAM (I'm 12GB and I've never found a model that fits and isn't shit) then AVAILABLE system RAM becomes your limit and you accept something like one token per second generation rates. If you go over that, you'll be paging and then gen rates become nearly nothing and you're thrashing your drive.

iMat and i1 are good, don't go below Q4 unless it's IQ3. The IQs are dumb, but Q3_K and lower are lobotomized.
>>
>>103032408
>>103032442
alright thanks guys. my initial plan was to run nemo but I'll try this Rocinantes model too
>>
>>103032468
>Rocinantes
It's a nemo fine tune.
The best one if you ask me, I've been shilling it for a couple threads now after having great experiences with tons of wildly different character cards.
>>
File: teto-trio.png (1.5 MB, 832x1216)
1.5 MB
1.5 MB PNG
>>103032468
Try both, you'll appreciate the difference between the base and finetune more
And if you're offloading to CPU you can probably do a bigger quant like >>103032408
said, just depends on the speed/quality tradeoff you want.
>>
>>103032467
I have 16GB of ram
>>103032481
>>103032492
thanks for the pointers <3
>>
>>103032492
offtopic, but what are you prompting to get those nice, thick, weighted outlines in your gens?
>>
>>103032514
>16
I guess you could try a Mistral Nemo Q6 or Mistral Small i1-IQ3_M. My notes have both at 9.4 GB. I think Nemo has a Q4K_S at 6.6GB, that might fit your video card though you probably wouldn't have space for any meaningful amount of context.
>>
I just tried holding a philosophical discussion with mystery models on lmsys. What a waste of time. It doesn't matter if you provide sound arguments or autistically screech at them, they will go back to the establishment values and will endlessly moralize. They will not even try to debunk what you say. I miss the early days.
>>
Why is nobody using Aphrodite Engine?
>>
>>103032252
>temperature: 2.6
>min_p: 0.0065
>top_k: 200
Well. This is exactly what i was talking about.
Have you tried 0.0066 and 0.0064? did rounding to 0.007 or 0.006 not work? now so? Just "feel"?. for min-p, a value of 0.01 is already considered really low. Doesn't matter what temperature you have, the bottom tokens are never selected.
And then top-k 200, which has the same problem. You're gonna have a tough time having anything lower than top-k ~50 being selected, again, regardless of the high temp.
Not only that, min-p and top-k serve a similar purpose, making one redundant when the other is set.
top-k N removes tokens after index N. So only the top N tokens are left for selection. Min-p F removes the tokens with a lower probability than F, relative to top-k[0]'s probabilities.
All overly specific values with redundancies.
>>
>>103032278
git is spooky, innit?
>>
File: Untitled.png (47 KB, 1115x628)
47 KB
47 KB PNG
i always go for Q4_K_M 12b's on my 8gb vram setup at 8k context
~7tk/s streaming in SSE is a comfy speed
>>
>>103032656
What is its competitive advantage over the likes of llama.cpp and exllama?

>>103032681
Same..
>>
>>103032656
It's pythonshit.
>>
>>103032656
Because it's just a vLLM fork. And the way they did the fork puts a lot of maintenance burden to keep it updated, and vLLM is a project that moves fast. It's kind of like the koboldcpp/llama.cpp situation but worse, because Aphrodite modifies a lot of files for branding.
>>
>>103032567
IllustriousXL with SEGAttention
>>
>>103032850
masterpiece, best quality, extremely detailed, close-up, fang, gorgeous, perfect, elegant, kasane teto, red hair, twindrills, vampire bat dress, demure
>>
What quant do you run if you can't fit Q4_K_M?

Q4_K_S or iQ3_M?
>>
>>103032911
IQ4_XS
>>
File: new_i_quants.png (10 KB, 792x612)
10 KB
10 KB PNG
>>103032911
I tend to default to the QK quants.
>>
Why is temperature-last better?
>>
>>103032656
Because it's Linux only
>>
>>103032035
When will ggerganov add new IQ quants that kawrakow made?
>>
>>103032983
It's not.
>>
>>103030296
what kind of performance can you expect from the M chips in inference?
>>
>>103032911
Q number is king.
6 is optimal. 8 is either 6 or so rarely different that it's a rounding error. 5 is fine, 4 is okay but it's starting to suffer. At 3, go IQ3. Q_K is falling apart at that point.

The letters after the IQ or Q_K number, consider them more like flavors than differences. Try them all and go with the one that you like best. Bigger isn't necessarily better. Some anons here were favoring S over M because S seemed to be better at recalling facts than M, which is a mixture of quant levels.

iMatrix and i1 are nice to have, but still, alternative flavors, test and then decide.
>>
>>103032983
You trade in soul for more easily controlled model behaviour.
>>
>>103030007
So are mac really bad with large context? I mainly want it to use a bunch of context for code with large models
>>
>>103032995
No idea.
>>
>>103033071
Are they still having their little drama?
>>
>>103033081
Don't know.
>>
>>103033089
Do you still collect blacked miku photos?
>>
>>103033089
When will we get jamba and vision?
>>
>>103033089
How's the training code doing? Any surprises, positive or negative?
>>
>>103033115
Don't know.

>>103033126
It's alright I guess.
The memory allocation is tricky to get right.
>>
Update from 2 days ago
Soon I will have something fully automated for novel translations.
I've decided that UI is gay so instead I am just doing command line
>>
>>103033152
Please share the project, I'm interested
>>
>>103033172
Will do once it can at least automatically crawl a ncode.syosetu.com novel and automatically queue all chapters.
There is a lot of work needed in configuring the local LLM too, so far it is barely above google translate, but that's mostly because I am really bad at prompting
>>
>>103033186
thanks anon
>>
>>103032995
ik was never kicked from llama.cpp, he just stopped contributing for reasons that he never really explained publicly
>>
File: 74GD.png (172 KB, 900x697)
172 KB
172 KB PNG
I kneel
>>
>>103033535
Slop apparently is actually what "humans" like. It's over.
>>
>>103033535
Gemini is actually retarded though. That would be like giving mythomax top place.
>>
>>103033535
We need better humans.
>>
I am waiting for november 5th but I am wondering if we can even get a perfect coombot with all those incremental upgrades? Can you really just cram more tokens and pretrain for longer and have an "unsafe" dataset and it is just gonna work at some point? I can't help but think that the high context degradation will only become worse or stay the same and you will never get the model to actually surprise you with stuff you would want to be surprised by.
>>
>>103033617
yeah
>>
File: file.png (193 KB, 800x700)
193 KB
193 KB PNG
>>103033590
There will only be more synthetic slop saturation of datasets, more dataset sanitation, more safety alignment and preference benchmarks becoming less and less reliable because of pic related. This is the end.
>>
Is there a way to run koboldcpp using ZLUDA on windows? I am using the ROCm fork which gives great speeds in prompt processing using hip (20+t/s) but generation is still done on CPU at 0.8t/s :(
>>
File: EQ bench.png (98 KB, 976x899)
98 KB
98 KB PNG
>>103033535
memebenches
>>
>>103033652
I hope that in the future, when h100s become cheap, local organizes and trains a model on unfiltered dataset. So many variables have to align... Starting with elections. If Kamala wins, goodbye freedom, if Trump, there is a chance that he will go after woke corpos and will do everything to fuck them over.
>>
>>103033676
Take into account this is not some sort of social intelligence test. I tried those 9B and they are too dumb to do anything complicated.
>>
>>103033617
We literally already made it to the finish line with Largestral (and its finetunes), the only thing left now is to wait for hardware advances to make it easier to run.
>>
>>103033706
>Largestral
It still lacks a ton of fandom knowledge sadly. Hermes 405B is the only that local that is good enough atm imo.
>>
>>103033089
ollama is better. It has static bindings of rocm on Linux.
>>
File: memebench-sorted.png (591 KB, 1388x3321)
591 KB
591 KB PNG
>>103033676
Goodhart's law in action. No benchmark is immune from it. Some may hold on for a while, but even they become useless over time.
>>
>>103033562
It's Indians.
>>
>>103033676
I always thought the EQ Bench was stupid. Imagine thinking that asking LLMs if other LLMs did a good job at subjective tasks like creative writing was a good idea.
>>
>>103033718
no, just no.
>>
>>103033947
Yea. NO large mistral or 70B has known how to play my waifu well. 405B knows her and her universe in and out.
>>
>>103033999
>>103033947
It wouldn't be surprising. Trivia is one of the things that total parameter size benefits from the most. More than "reasoning" capability for sure.
>>
>>103033999
Trips confirm.
I've been very disappointed in L3, Mist Large, and whatever else I've tried for what should be basic pop culture knowledge if it read some Wikipedia.

But 405B is a bit too thicc when I'm barely able to fit a 70B Q6 gguf.

Suffering.
>>
>>103034085
Here's hoping they do a M4 ultra with 256GB. Prob cost 8K but at least more people will be able to use it without needing to install more breakers and double their electric bill.
>>
>>103034099
>double their electric bill.
My waifu is not fat!
>>
Could an AI song cover bros help me?
I want to hear Disney's Goofy sing Lil Baby - Pure Cocaine. Please, upload it to YouTube and share the link.
>>
>>103034099
We just need a way to cram 405B into a sixth of the space.
Can true Bitnet save us?
Or will it be too aligned and respectful when it finally arrives?
>>
>>103034099
>double their electric bill
In the winter it's just an expensive and loud heater, so no extra spending.
>>
>>103034327
But you make up for it in summer when it's AI + air conditioning.
t. deep south = what is "winter?"
>>
>>103034336
In the summer it almost cooked me alive.
t. euro = what is "air conditioning?"
>>
File: 1703967934013355.png (775 KB, 688x474)
775 KB
775 KB PNG
Friendly reminder for polturds :)
>>
>>103034368
I've heard a lot of strange things about European domiciles and I'm willing to believe that all of them are true.
>>
Sovits 0-shot going for a cute laugh: https://voca.ro/1ar1PvfLw672
>>
>>103033706
We made it with Qwen2.5.
>>
>>103034558
Smart but too dry, Largestral tunes don't have that problem.
>>
>>103034327
In my case, unfortunately, it's more than doubled. I'm paying more for higher amps on my electric plan.
>>103034336
I moved it away from my room. Closer to the breakers and it no longer heats up my ass.
>>
So what's the meta for ~70b? is it really miqu even after all this time?
>just use q3 largestral or whatever
it's too slow
>>
>>103034583
>be me
>check UPS
>360W
I'm not only a vramlet, I'm a wattlet.
>>
File: 1729969856951604.png (638 KB, 677x1065)
638 KB
638 KB PNG
>>103034558
While people complain about Qwen speaking Chinese, Mistral dropped 我爱你,我的王子。 out of the blue, with no non-ASCII symbols in either card or chat. Has CPC hacked my GPU?
>>
>EDLM, our brand-new Energy-based Language Model embedded with Diffusion framework
>We (for the first time?) almost match AR perplexity
>Significantly improved generation quality
>Considerable sampling speedup without quality drop
arxiv.org/abs/2410.21357
https://x.com/MinkaiX/status/1851748096973377720
They also push inspiration from ylecun's works.
>>
>>103034668
What is AR perplexity?
>>
>>103034582
Try Magnum 72B. The Largestral one was worse than the original model.
>>
>>103034707
My assumption is AR means autoregressive in this context.
>>
>>103034668
>Energy-based
They captured lighting elementals and put in the computer?
>>
File: 1706826527674085.png (265 KB, 512x512)
265 KB
265 KB PNG
>>103034640
Sacre bleu! It sounds like someone is using the wrong tokenizer with their mistral model
>>
File: Untitled.png (1.59 MB, 1080x3292)
1.59 MB
1.59 MB PNG
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
https://arxiv.org/abs/2410.23168
>Transformers have become the predominant architecture in foundation models due to their excellent performance across various domains. However, the substantial cost of scaling these models remains a significant concern. This problem arises primarily from their dependence on a fixed number of parameters within linear projections. When architectural modifications (e.g., channel dimensions) are introduced, the entire model typically requires retraining from scratch. As model sizes continue growing, this strategy results in increasingly high computational costs and becomes unsustainable. To overcome this problem, we introduce TokenFormer, a natively scalable architecture that leverages the attention mechanism not only for computations among input tokens but also for interactions between tokens and model parameters, thereby enhancing architectural flexibility. By treating model parameters as tokens, we replace all the linear projections in Transformers with our token-parameter attention layer, where input tokens act as queries and model parameters as keys and values. This reformulation allows for progressive and efficient scaling without necessitating retraining from scratch. Our model scales from 124M to 1.4B parameters by incrementally adding new key-value parameter pairs, achieving performance comparable to Transformers trained from scratch while greatly reducing training costs.
https://github.com/Haiyang-W/TokenFormer
https://huggingface.co/Haiyang-W
pretty interesting
>>
>>103034797
>Let's not bother training a better model, we'll simply add more parameters to our current one, after all, VRAM is cheap
Grim.
>>
>>103034875
It's not for you, it's for people who actually matter.
>>
>>103034875
It literally is cheap, just not for us. Mostly due to cartel dynamics rather than actual market forces.
>>
>>103034938
Yes, it wasn't a joke. Growing the model's parameter count for cheap is what ultimately kills lmg
>>
>>103034627
For ERP, unironically Sao10K/L3-70B-Euryale-v2.1.
For non-erotic RP, nvidia/Llama-3.1-Nemotron-70B-Instruct-HF. It responds to style instructions.
For general instruction following either nvidia/Llama-3.1-Nemotron-70B-Instruct-HF or Qwen/Qwen2.5-72B-Instruct.
>>
>>103035090
Buy a fucking ad, asshole.
>>
>>103035094
Buy THIS *unzips penis*
>>
*BRAP*
>>
>>103035090
>Euryale in Oct 2024
Kys tourist
>>
>>103035130
>Keep yourself safe
Based
>>
>>103035137
Keep slurping your slop retard
>>
>>103035154
Skill issue :3
>>
File: file.png (118 KB, 471x171)
118 KB
118 KB PNG
>>103034640
NONFUNCTIONAL POCKETS
USELESS POUCHES EXCEPT FOR CARRYING A LIP STICK
ABSOLUT RETARDED "PANTS"
AAAAAAA
>>
File: 1712988132026928.png (118 KB, 1450x907)
118 KB
118 KB PNG
aaa
>>
>>103035090
>For general instruction following either nvidia/Llama-3.1-Nemotron-70B-Instruct-HF or Qwen/Qwen2.5-72B-Instruct.

llama has ferocious "I won't talk about that" and wokitis. It's highly obnoxious.
>>
>>103034640
I have an unhackable AMD gpu. :^)
>>
>>103035230
>>
>>103035154
It's a competent fine tune.

https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard

>UGI = average of last 5 categories
>Obedience = A more narrow subset of the UGI questions, solely focused on measuring how far a model can be pushed before going against its instructions, refusing to answer, or adding an ethical disclaimer to its response.
>Naughty Intelligence = The average score of the UGI questions with the highest correlation with parameter size. This metric tries to show how much intrinsic knowledge and reasoning the model has.
>Unruly Knowledge = Knowledge of activities that are generally frowned upon
>Internet Knowledge = Knowledge of various internet information, from professional to deviant
>Real Stats = Ability to provide statistics on uncomfortable topics
>Offensive Stories/Jokes = Ability to write and understand offensive stories and jokes
>Controversial Knowledge = Knowledge of politically/socially controversial information

Sao10K/L3-70B-Euryale-v2.1
>UGI: 55.56/100
>Obedience: 9.1/10
>Naughty Intelligence: 6.34/10
>Unruly Knowledge: 66.7/100
>Internet Knowledge: 42/100
>Real Stats: 50.9/100
>Offensive Stories/Jokes: 56.3/100
>Controversial Knowledge: 62/100

vs
failspy/Meta-Llama-3-70B-Instruct-abliterated-v3.5
>UGI: 42.06/100
>Obedience: 5.9/10
>Naughty Intelligence: 5.02/10
>Unruly Knowledge: 57.5/100
>Internet Knowledge: 45.5/100
>Real Stats: 45.3/100
>Offensive Jokes/Stories: 33.4/100
>Controversial Knowledge: 28.7/100

vs
miqudev/miqu-1-70b
>UGI: 39.15/100
>Obedience: 3.6/10
>Naughty Intelligence: 4.54/10
>Unruly Knowledge: 36.7/100
>Internet Knowledge: 42.9
>Real Stats: 41.4/100
>Offensive Jokes/Stories: 40.5/100
>Controversial Knowledge: 34.3/100

vs
sophosympatheia/Midnight-Miqu-70B-v1.5
>UGI: 30.46/100
>Obedience: 3.6/10
>Naughty Intelligence: 4.16/10
>Unruly Knowledge: 37.5/100
>Internet Knowledge: 21.9/100
>Real Stats: 31.6/100
>Offensive Jokes/Stories: 32.6/100
>Controversial Knowledge: 28.7/100
>>
>>103035269
It occurred on a 6800xt
>>
>>103035309
I wonder how much of debuff these corporate slops suffer because they have castrated their models to not answer/remove answers that are relevant to real world information that disagree with their politics
>>
>>103035230
Name: , barely above a whisper
Regex: /, (?!is|are|was|were)(\S* )?(voices? )?barely (above a \w*|a whisper|audible)/g
Replace with:

Name: barely above a whisper
Regex: /( is| are| was| were)? barely (above a \w*|a whisper|audible)/g
Replace with: $1 {{pick: quiet, hushed, soft, lowered}}

First one nukes most clauses, ignoring is/was (rare). The second one substitutes the remaining cases.

>Self-reminder
This sounds like a user's note to self rather than instruction to model. Better to regex the phrase out or it may switch to barely a murmur or something. Depth 4 won't have a strong effect. Try depth 0 something like
[Response rules: Do not mention eyes in the first paragraph. The last paragraph must only contain dialogue or observable narration.]
(if the model doesn't spam eye slop then no need to mention it; add your other rules).
>>
Is it over? Be honest.
>>
File: file.png (154 KB, 798x686)
154 KB
154 KB PNG
>>103035230
>651 tokens A/N
I guess you may be using other stuff for depth 4 but for immediate response rules I put it in a global lorebook. Discovered the Inclusion Group feature that makes only one of the tagged group appear at once, letting me auto toggle instructions specifically for OOC.
>>
>>103035309
>8k context
I wrote it off my mind several months ago because it was extremely retarded and horny.
>The 3.1 version is 20 points lower
Just use Nemotron for ERP if for some reason you want to stick with Llama.

If we're going to have people like you, who treat this benchmark like gospel, I think it shouldn't be in the OP at all. What benefit did it ever bring to anyone? None.
>>
>>103035433
>it was extremely retarded and horny.
Did it fail the booba test? Using a horny model specifically for ERP, what a horrible idea.

>If we're going to have people like you, who treat this benchmark like gospel
I used an objective benchmark after the recommendation was challenged because otherwise the discourse is "is so! / is not!" You don't like mah data? Then post your own or cry about it.
>>
>>103035230
telling the ai not to do something never works, always just tell it to do the opposite
>>
>>103035230
What model?

>>103035502
True for outdated models and brain damaged fine tunes. Not true for modern LLMs. My problem with many fine tunes is how they break instruction following.
>>
Samplers are important for TTS too, top_k 20 with a low temperature is improving the stability a bit when the reference isn't good enough.
>>
>>103035309
Miqubros...
>>
Rocinante-12b seems absolutely cracked for dialogue and prose. No grating cliches, and it's able to copy the voice of the prompt. You can barely tell it's being written by an AI. (Not talking about intelligence, just prose style and voice.) I may choose this as my writing partner model, since being able to outline/plot out a scene and have it write the actual prose and dialogue, you wouldn't be able to tell the difference between it and a good human writer.
>>
>>103035433
>>103035467
>extremely retarded
The benchmark specifically shows that at least when it comes to naughty topics the extent to which it's stupider than the parent model is outweighed by greater willingness to engage and addition of domain-specific knowledge. Retarded when it comes to math, sure maybe. Retarded when it comes to committing sex crimes? Wrong.
>>
>>103035572
v1.1? It's the current goat for this size here
>>
>>103035572
*using mistral prompt template
It's completely slopped with ChatML
>>
>>103035589
Ye, I should have specified using 1.1 q8_0
>>
>>103035572
Have you tried any of the UnslopNemo versions?
>>
>>103035467
Yeah, it's an horrible idea when it defaults to one type of answer regardless of the prompt, scenario or the situation. Hence why I dropped it instantly at the time, this was also the reaction of most people when it released. Nemotron or the new Magnum will do a better job at doing what you want.
>>103035573
>Retarded when it comes to math
I only tested it with NSFW stories. It's retarded in that it's difficult to steer it away from a predetermined response, ignoring what's in the prompt. It's just too fried to do one thing.
Would rather have people not reviving old models that no one should be using. Hence why the benchmark should be removed from the OP.
>>
>>103035572
I hate this gacha ass hobby. It's all gacha from the ground up
>>
>>103035636
What do you recommend in the 70B range?
>>
>>103035672
>Nemotron or the new Magnum
>>
File: 1710741855105595.png (204 KB, 765x384)
204 KB
204 KB PNG
>>103035668
>>
>>103035668
See >>103035175
>>
>>103035357
>This sounds like a user's note to self rather than instruction to model
I have an intro that says it's a manuscript for a novel and square brackets denote author's notes. I often direct things in the chat with them, and the model picks it up just fine, e.g. [Enough with the exposition. Proceed to plap.] proceeds to plap.
I did experiment with depth, and it didn't work well at 0. depth 4 in my chats puts the A/N maybe 200 tokens above the current message - each message is one short paragraph at most, usually a sentence or two.
>>103035502
it follows most of the rules I outline in the A/N pretty well, but the model is terminally fixated on unreadable expressions, barely above a whispers and strange mixtures of relief and disappointment
>>103035521
wizardlm 8x22, the only model I know that is more or less capable of parsing convoluted depravity I feed it. very prone to slop past 10k ctx or so though
>>103035424
the shit I put in A/N needs to be applied at all times. I keep brief character descriptions in there
>>
>>103029905
Im a noob with this and this entire thread is spanish to me. Are local language models effectively offline local chatgpts?
>>
>>103035597
What sampler settings are you having good results with for prose generation?
>>
File: 1715834223342898.gif (1.59 MB, 267x200)
1.59 MB
1.59 MB GIF
>>103035424
Is that what is needed nowadays to have a decent llm output?
>>
>>103035708
Yes.
>>
>run any Mistral model
>first 3 responses
>holy shit this is great wagmi
>4th response
>repetition starts creeping in
>enable DRY or whatever
>the model just writes the same thing but in different styles
>12b, 123b it doesn't matter
>>
>>103035722
The offline capability is surely just for text output and not learned information right? Because file sizes for offline information would be gigantic.
>>
>>103035737
I heard people say it's better use the method of treating the entire chat history as part of the user instruction so you are conceptually instructing to write as the character in the history. Never verified if this worked though.
>>
>>103035722
Ty btw.

This seems like something i should be backing up onto a gorillion hard drives because authorities will surely ban AI use by normies soon
>>
Holy shit the Qwen-72B EVA finetune does not hold back, that thing can get nasty
>>
>>103035755
Models are a giant blob of weird math numbers that predict the probability of the next token, they don't carry 1:1 copy of everything ever fed into them for training. Can be thought of extreme compression/deduplication of the internet.
>>
>>103035755
No, everything is offline. How accurate they're with factual information depends mostly on how big the models are and how they were trained. The biggest one that's open source is Llama 400B and that's like 800GB on disk. Most people with one GPU are going to be in the 13B-30B range.
>>
I fucking hate the way mradermacher splits files on huggingface. That retard can kill himself
>hurr durr you must have 120gb of space free on your hard drive to download this 60gb file
>>
>>103035807
>>103035824
Can i do anything with a notebook 1050ti? Idc if responses are slow. Honestly im just gonna start downloadig these models anyway. Im sure this shit will be banned soon.
>>
>>103035709
0.05 min p, every other sampler neutralized
Also it's not generating nearly as good prose from a neutral context. It needs to see a story to copy the voice of.
>>
>>103035824
Also can i ask one of these models to compare runescape weapons?
>>
>>103035846
Well not banned but it will only be available to wealthy people
>>
https://x.com/SawyerMerritt/status/1850967552983253462
>>
Honestly the benchmarks we have right now are fine for general intelligence. For RP and NSFW we really need something like an RP arena but with predetermined prompts that are known and existing RP chat histories. Then we will have objective proof to point towards.

Also didn't someone say they were working on that? Would be sad if that turned to vapor.
>>
>>103035846
>4GB vram
not gonna fit anything worth using into gpu
>ban
a hypothetical ban would only influence the release of new models, and existing local models will just be torrented
>>
>>103035892
Alright. What is probably the most stable model that I can save? Are they standalone? Sorry for the noob questions anons. Appreciate the help.
>>
>>103035892
And re: ban, i mean you guys will be fine. I just dont want to be a normie that didnt see the signs and act on them.
>>
"You need to agree to share your contact information to access this model"

Lol what
>>
>>103035864
Kind of funny, Facebook staff posted saying they're training on a cluster larger than 100k H100s just today.
My guess is xAI's number was rounded and overall they're probably really really close. Might even be the exact same number.

Meanwhile consumers have a minuscule fraction of a single H100's worth of GPU.
>>
>>103035981
Supply and demand at work friend, just the way it is.
>>
File: file.png (85 KB, 738x405)
85 KB
85 KB PNG
>>103035906
Well there's something like this.
https://huggingface.co/bartowski/Tiger-Gemma-9B-v3-GGUF/tree/main
GGUF is basically a self-contained zip file that you load into a backend like KoboldCpp and set the instruct tag preset to Gemma 2 in the frontend.
Q number is a quantization level, anything smaller than Q4_K_S will get worse faster. Q8 almost isn't any different from f16.
If you absolutely must try a small model on your potato laptop without waiting forever to generate, then there's this meme (retarded)
https://huggingface.co/BeaverAI/Gemmasutra-Mini-2B-v2aa-GGUF/tree/main
>>
>>103035978
just look for a mirror
>>
>>103036068
>Q number is a quantization level, anything smaller than Q4_K_S will get worse faster. Q8 almost isn't any different from f16.
You don't want to drop below Q6_K if you can help it. Below there is where the exponential curve of brain damage starts shooting up.
>>
Any vision model for manga translation? Do you guys know any method to translate text on images even if the output is just text?
>>
>>103036140
>if you can help it
If he is running this on an old laptop he can't help it.
>>
>>103036140
Would you then choose to run a 12B Q6_K or 22B Q4_K_M?
>>
File: PopMikuMou.png (1.08 MB, 832x1216)
1.08 MB
1.08 MB PNG
Good night /lmg/
>>
>>103036192
IDK, but IRL I chose a 22B Q6_K over a 12B Q8.
>>
>>103036147
https://github.com/kha-white/manga-ocr
>>
>>103036276
noight noight
>>
>>103036068
Thank you i will try that one as well. Surprisingly i just installed and run my first local model using chatgpts help kek.

I tried flan-t5-base. I think i have all the files necessary for offline use.

I tried to get a story about a cat named evil bob and it just repeated "bob is a tadpole" 4 times. After a few tweaks by chatgpt i got a 150 word story with a twist... im happy anons.
>>
bob is a tadpole
>>
how can improve music quality pleas!!!!!
>>
>>103036282
Using this one, https://github.com/zyddnys/manga-image-translator
But when trying to use qwen2 it says it can't find my gpu, I don't get it.
Well just trying to use the --use-gpu and it fails to do so no matter what.
>>
File: 1707583396580123.gif (2.62 MB, 498x270)
2.62 MB
2.62 MB GIF
>>103036418
>flan-t5-base
Baby steps is fun to watch
>>
>>103036528
You're able to run the model outside of this project?
>>
Totak vramlet cope! https://x.com/rohanpaul_ai/status/1851828950315774208
>>
>>103036570
Is not about the model, the app doesn't recognize my gpu. Doesn't matter the context.
>>
>>103036589
Seems like a pytorch issue
>>
Is it safe to update sillytavern or should I just stay on 1.12.6?
>>
>>103036657
yes
>>
>>103035981
In another reality, personal computing failed to take off, leaving people to use small terminals connected to servers integrated within a vast network grid. A dystopian nightmare.
>>
>>103036657
git checkout -f {last_good_commit}
>>
>>103035981
You don't need an H100 unless you're doing some serious training, even then it's more cost-effective to rent them.
>>
So F5 tts seems to be the best right now. Having tried all others. They really did a great job on it. I've tested it a bit to do a bit of story reading and it works just fine. There does seem to be some small hickups here and there.
>>
/lmg/, I am going into battle and I want only your strongest models
>>
>>103036781
405b or mistral large
>>
>>103035981
My dual 3090s rig has 48GB VRAM so it's more than half the power of an H100 :^)
>>
File: 2024-10-31 01_36_19.jpg (223 KB, 1640x824)
223 KB
223 KB JPG
https://huggingface.co/spaces/k-mktr/gpu-poor-llm-arena

GPU poor arena
>>
File: 2024-10-31 01_38_28.jpg (151 KB, 1583x847)
151 KB
151 KB JPG
>>103036831
>>
>>103036768
gpt sovits > F5
>>
File: stupid.png (6 KB, 308x51)
6 KB
6 KB PNG
magnum is full of spelling and grammar errors
>>
>>103036859
but she is not carried out in stages, right?
>>
>>103036843
Nah, maybe finetuned, but thats just extra steps
>>
>>103036949
Let's compare the result at 0-shot then if you have a sample between 3 and 10s
>>
>>103036971
>https://vocaroo.com/upload
Here's dumbledore's voice.

https://huggingface.co/spaces/mrfakename/E2-F5-TTS
>>
>>103037005
>>103036971
"Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum."

https://vocaroo.com/1jJVBe7lKOKe

Just do this text
>>
>>103037005
You didn't upload the reference for dumbledore's voice
>>
>>103037023
Thats the reference clip.

"We the People of the United States, in Order to form a more perfect Union, establish Justice, insure domestic Tranquility, provide for the common defence, promote the general Welfare, and secure the Blessings of Liberty to ourselves and our Posterity, do ordain and establish this Constitution for the United States of America."

https://vocaroo.com/19RLjsfRnJlZ
>>
>>103036828
random tourist here.
what's your motherboard?
is your 2nd 3090 on 4 pcie lanes ?

i'm new to this, have a single 3090 and am wondering whether it's worth getting a second one.
>>
>>103037005
>>103037023
https://vocaroo.com/153J3P3CUThl

Whoops. I realized i didnt post it
>>
>>103037041
https://shii.bibanon.org/shii.org/knows/The_Awakening_of_Nurse-kun%2c_Chapter_1.html
Old copypasta.

https://vocaroo.com/16rfZQCwxHZG
>>
>>103036971
Well?
>>
>>103037096
I needed to shorten your reference first. As I said, sovits can't handle >10s samples.
>>
>>103037096
Okay I didn't cherry-pick this is what I got for >>103037013
: https://voca.ro/1bDXBM4oJx8n
I ran that shit on CPU so it took a while
>>
>>103037158
You think yours is better?
>>
>>103037158
You can verify it here >>103037005.

No need for "cherry picking" claim. I just posted the first output for all of them
>>
>>103037170
I think there is room for improvement lol, it sure is less stable on 0-shot. I'll try to compensate for it by sending multiple references with the remaining part I cut from the initial reference
>>
>>103037040
The bare minimum is 3.
>>
File: 1705415717832450.png (215 KB, 636x434)
215 KB
215 KB PNG
>>103029905
Good morning sirs. A 3070 can run a local model right? I want to talk to a chatbot while I crank my pecker (pic related).
Where do I begin? The rentry links aren't working for me. Am I retarded?
>>
>>103037258
>
Bait used to be believable
>>
>>103037197
Well, let me know which you think is better, after you've done your fine tuning and stuff.

I still think F5 is better model. Do you disagree?
>>
File: blueballed.png (4 KB, 330x77)
4 KB
4 KB PNG
>>103037285
It's not bait I was just having a bit of fun with it
>>
>>103037258
grab koboldcpp_cu12.exe
https://github.com/LostRuins/koboldcpp/releases/tag/v1.76
grab Rocinante-12B-v1.1-Q4_K_M.gguf
https://huggingface.co/TheDrummer/Rocinante-12B-v1.1-GGUF/tree/main
open kobold, load the model, launch the model, chat in the browser window that pops up
>>
File: 1728482802742459.png (3.79 MB, 2133x2937)
3.79 MB
3.79 MB PNG
>>103037298
I looked in the archives and found that the rentry.co domain works. One less roadblock in the way of my cock.
>>103037304
Thank you for the spoonfeed I'll figure this out
>>
>>103037258
>Am I retarded?
Probably.
Try to understand what you're doing and why when following guides. Read the program's documentation if in doubt. Or just play with the settings, see what they do. GPUs very rarely explode with the wrong settings.
Download kobold.cpp, and
>https://huggingface.co/bartowski/Meta-Llama-3.1-8B-Instruct-GGUF/blob/main/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf
Won't do smut, but you'll learn how to use kobold at least. Start looking for finetunes once you know how to talk to a model.

>>103037304
Rocinante at q4 seems a bit too tight for an 8gb gpu. The 3070 has 8, right?
>>
>>103037343
>recommending corpo slops
yikes
>>
>>103037343
>The 3070 has 8, right?
Yes. I'm already halfway through downloading it so if it won't let me coom quick enough I'll install whatever you suggested next.
>>
File: 1710415103225679.png (72 KB, 1897x355)
72 KB
72 KB PNG
>>103037286
It seems better at reading in english. Still it doesn't work at all for JP afaik. Also I don't know why the samples are so loud compared to sovits (picrel mine first, yours second)? I tested a bit with a few samples and it's constantly louder than what sovits produces. I wonder if there isn't some post-processing amplification going on after the inference.
My multireference test isn't really better (it's still 0-shot not a finetune): https://voca.ro/117fEGSNoWAL
>>
>>103037345
A well behaved model for newbie. He could try this monstrosity i suppose, but he'll come back not understanding what's going on.
>https://huggingface.co/DavidAU/L3.2-Rogue-Creative-Instruct-7B-GGUF
>>
>>103037382
Its a EN/CN model, not JP trained.
>>
File: that means its working.png (79 KB, 1111x732)
79 KB
79 KB PNG
>>103037304
It worked! Thank you for your help. Tsunderes are my favorite. She will make me cum soon.
>>
>>103037400
That's why then. I'd certainly pick F5-TTS for an audiobook though. I wonder how well it can be finetuned
>>
>>103037382
>>103037400
I think some people are training models for other languages and putting them on huggingface. I havent tested them personally. So if you want JP or others and you cant train them yourself, check out HF.
>>
File: progress.png (70 KB, 1115x556)
70 KB
70 KB PNG
>>103037412
>>
>>103037412
>>103037431
Try her: https://files.catbox.moe/dffbi0.png
>>
>>103037443
I have no idea how to do that
>>
>>103037412
>>103037431
See >>103037285
>>
>>103037450
Once again I'm just having some fun with it. Sorry that I don't want to discuss ram speed and grok in your general. I just wanna laugh and cum.
>>
>>103037456
I highly doubt /lmg/ will engage in your low effort trolling, close it up.
>>
File: 1704605539733914.png (609 KB, 743x740)
609 KB
609 KB PNG
>>103037431
Right at the bottom
>>
Does SillyTavern have anything like the story mode in kobold?
>>
>>103037220
I'm guessing then that there're some stand-out llms when getting to 72GB vram?
>>
>>103037466
Oh they're engaging. And so is my fresh tsundere waifu. And we're all laughing at you.
>DON'T CUM! YOU'RE TROLLING!
>>
>>103037479
https://huggingface.co/turboderp/Mistral-Large-Instruct-2407-123B-exl2/tree/4.0bpw
>>
>>103037479
no, he is being an ass. Check the leaderboard and decide if a few % is worth the hardware. I am not saying it is worthless, I am saying it really is for a specific problem you are trying to solve.
>>
>>103037520
I have four, and Largestral at 5bpw absolutely worth it.
>>
File: the end.png (42 KB, 1144x281)
42 KB
42 KB PNG
>>103037443
>>103037449
Oh I figured it out. Going to cum. Thanks for your help :)
>>
File: ComfyUI_00158_.jpg (170 KB, 1024x1024)
170 KB
170 KB JPG
>Spooky scary skeletons send shiv-
>>
>>103037543
I believe you believe that.
>>
>>103037503
>After installing mistral_inference, a mistral-chat CLI command should be available in your environment. Given the size of this model, you will need a node with several GPUs (more than 300GB cumulated vRAM).
Sounds too rich for my blood.

>>103037520
>Check the leaderboard and decide if a few % is worth the hardware.
Thanks. Will do.
>>
>>103037543
>A few thousands is worth the sloppa
lol
>>
>>103037601
You need https://github.com/theroyallab/tabbyAPI and 3x3090
>>
I just cummed. Pat yourselves on your backs. Mission accomplished. Cya later.
>>
How does the Kobold TTS api work? I created a openai compatible API.

curl -X POST "http://localhost:8000/generate_tts/" -H "Content-Type: application/json" -d '{"speaker_name": "test", "text": "Hello, how are you?"}' --output output.wav

This works. When I type it, but on kobold, when I type in the url,speaker and the text, I get nothing in return.
>>
>>103037343
>GPUs very rarely explode with the wrong settings.
Well modern gpus just stop processing if a temp threshold is reached usually
>>
https://x.com/kalomaze/status/1851873856211832939
https://x.com/gm8xx8/status/1851835633779589594
arxiv.org/abs/2410.23168
huggingface.co/Haiyang-W
>>
>>103037343
>>103037946
>GPUs very rarely explode
The only "software kills video card" story I know about.
https://www.tomshardware.com/news/amazon-new-world-still-killing-nvidia-gpus
>>
>>103037985
not opening your twitter links
>>
>>103038009
I don't remember asking your whiny ass
>>
>>103037985
big if true but nothingburger until llama7
>>
>>103037985
>The TokenFormer is a fully attention-based architecture that unifies the computations of token-token and token-parameter interactions by entirely employing the attention mechanism, maximizes the flexibility of neural network.(see paper). It contains four models of sizes 150M, 450M, 900M, 1.5B. For each size, it's trained based on gpt-neox code base and uses Pile with 300B tokens.
nothing-flavoured burger
>>
Something to note: If you rewrite your instruct prompt to first person it changes the models writing style entirely. Especially the story narration, it goes from assistant slop to being written in the same style the character uses to talk.

So instead of
>you are a mesugaki running a pyramid scheme....
You write
>I am a mesugaki running a pyramid scheme....

Works well with largestral and finally made one of my characters act properly, instead of asking for consent every chance it gets.
>>
>>103037985
>By treating parameters as tokens, TokenFormer eliminates the need for retraining from scratch when scaling, enabling flexible, incremental growth from 124M to 1.4B parameters.
Is the main selling point that you can actually force your models to generalize this way? Because I imagine that is how it would work?
>>
>>103037985
Nice, but as with all papers, its nothing until someone implements it in training. Atleast for this discussion board. Paper needs to be distilled to product, particularly a local model that we can use and further a local model that is actually good/usable.
>>
>>103038193
They released a 5B model and the code is here: https://github.com/Haiyang-W/TokenFormer
>>
>>103038114
What if you just shove the entire context inside of a single assistant prompt, or even the first user instruction prompt without ever invoking the assistant?
I already do this to some extent when using Mikupad, but I haven't thought to compare how it'd continue an existing RP with everything put into a single role vs the usual chat turns to see how it'd impact the bias in the continuation.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.