[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1710183281006582.png (840 KB, 600x800)
840 KB
840 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101902149 & >>101891613

►News
>(08/15) Hermes 3 released, full finetunes of Llama 3.1 base models: https://hf.co/collections/NousResearch/hermes-3-66bd6c01399b14b08fe335ea
>(08/12) Falcon Mamba 7B model from TII UAE: https://hf.co/tiiuae/falcon-mamba-7b
>(08/09) Qwen large audio-input language models: https://hf.co/Qwen/Qwen2-Audio-7B-Instruct
>(08/07) LG AI releases Korean bilingual model: https://hf.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct
>(08/05) vLLM GGUF loading support merged: https://github.com/vllm-project/vllm/pull/5191

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: threadrecap.png (1.48 MB, 1536x1536)
1.48 MB
1.48 MB PNG
►Recent Highlights from the Previous Thread: >>101902149

--Claude Opus makes a punny joke with a typo, sparking discussion on whether it's intentional or accidental: >>101906688 >>101906710 >>101907097 >>101907160 >>101907274 >>101907396 >>101907531 >>101907570 >>101907663 >>101907651 >>101907713 >>101907414 >>101907183
--Anon discusses GPU offloading and inference speed with others, sharing experiences and insights on optimizing performance with multiple GPUs and memory configurations.: >>101904663 >>101905027 >>101905115 >>101905143 >>101905192 >>101905487 >>101905522 >>101905664 >>101905704 >>101906001 >>101905780 >>101905532 >>101905535 >>101905832 >>101905446 >>101905534 >>101905628 >>101905935
--No reliable way to prevent model from taking over, requires experimentation and compromise: >>101904956 >>101905009 >>101905026 >>101905028
--Hermes-3-Llama-3.1-405B model released, a finetuned Llama 3.1 with advanced capabilities: >>101908328 >>101908359 >>101908388 >>101908706 >>101909065 >>101909097 >>101909818 >>101908413
--GGUF format works on imagegen, StableDiffusion comparison: >>101902193 >>101902666 >>101902737 >>101902794 >>101902909 >>101902937
--Turboderp's Exl2 optimization achieves 40% speedup: >>101903001 >>101903213
--Overview of AI-related boards on 4chan: >>101902487 >>101902519 >>101903523 >>101903624
--Discussion about AI chatbot general and local chatbot, finetuning, and models: >>101902195 >>101902247 >>101902329 >>101902381 >>101902392 >>101902397 >>101902414 >>101902559 >>101902581 >>101902898 >>101902927 >>101903225 >>101904858 >>101905606 >>101905069 >>101907169 >>101907403 >>101907476
--Anon gets help from a bot with their code: >>101908089 >>101908158 >>101908179 >>101908239 >>101908277 >>101908341 >>101908402 >>101908587 >>101908738 >>101909229
--Anon calls out Bart for forgetting about fp32 to bf16 conversion: >>101906749
--Miku (free space): >>101905014

►Recent Highlight Posts from the Previous Thread: >>101902153
>>
Mikulove
>>
Mikuhype
>>
>>101909890
Don't know, haven't tested it.
>>
File: Untitled.jpg (89 KB, 358x941)
89 KB
89 KB JPG
i'm shilling my own addon again

https://ufile.io/w1cii1vh

>>101906787
>>101907775
>>
>>101909930
Any reason to not upload it to a github so that we can download it directly through ST?
>>
>>101909869
>>(08/15) Hermes 3 released, full finetunes of Llama 3.1 base models: https://hf.co/collections/NousResearch/hermes-3-66bd6c01399b14b08fe335ea
we're shilling shitty tunes in the op again?
>>
>>101909949
>he doesn't know
>>
>>101909930
What depth does it put the scene information at? Can you control its positioning?
>>
>>101909943
its slop so far still, its hardly worthy of a git but that is the eventual goal
>>
File: ComfyUI_00097_.jpg (284 KB, 2048x2048)
284 KB
284 KB JPG
>>
>>101909964
yes at the very bottom it has a number box, 1 is default which i recommend. 0 is the last message so injecting below that causes oddities
>>
>>101909949
It's a full finetune and not merged with instruct unlike his last release. I'm interested myself.
>>
so, where's the strawberry?
>>
>>101909972
NO.
>>
>>101910079
strawberryman is currently on his way to take the entire openai office hostage in order to force altman to release chatgpt 5 so he doesn't look like a retard online
>>
>>101910079
Releasing together with bitnet in two weeks.
>>
File: file.png (9 KB, 316x61)
9 KB
9 KB PNG
>>101910110
>>101910162
>>101910079
LET'S GOOOOOOOOOOOOOOO
>>
>>101909989
Neat, I'll give it a try. I do find I have to manually edit AN's with details like that sometimes so a more automatic solution is nice.
>>
>>101902737
So you can't offload to ram with the image gguf thing? I guess I'll stick to the fp16, at least it works even though it's slow.
>>
>>101910180
>more automatic solution
please leave a review even if you hate it or have no use for it
i dont know what this addon is supposed to be yet fully
>>
> "old" models are deleted from huggingface
>hotfixxed
>paywalled
>newspeaked
Post Base models made by people
>>
>>101909949
as opposed to?
i agree tho, it's also a straight downgrade over instruct lol
>>
>>101909949
I like it so far to be desu
>>
>>101909949
This is a decent fine-tune from an organization with a reliable track record. Previous popular and decent merges (MythoMax, etc) use Hermes in the mix for its instruction-following ability and different writing style. Even the previous entry, Hermes 2 Theta for Llama3.0, is a very decent upgrade over the base instruct model from meta.
>>
>>101909818
Interesting, if true, but the proprietary by design nature is unfortunate as no one can verify the results or the reliability of the methodology. It would be cool if you could run some more Q6 and Q8 tests to compare them to the full precision results you already have on several models. This would do at least one thing which is verify (to yourself) that your test is resilient against random chance. I.e. a model is not just guessing something but truly contains the knowledge being tested for. This was on issue for some benchmarks, where some quants somehow boosted scores over the full precision weights.
>>
>101910537
end your worthless life pedo
>>
>101910537
KILL YOURSELF
>>
>>101910546
Do you know where you are?
>>
>>101910471
buy an ad
>>
>>101909930
How do I install
>>
File: 1000005129.png (65 KB, 290x229)
65 KB
65 KB PNG
>>101910568
newfriend...
>>
>>101909949
Hi Alpin.
>>
>>101910585
extract the director folder, then move it to M:\a\SillyTavern-staging\data\default-user\extensions\Director
>>
How do 7B models compare to the old GPT-3? Can I do uncensored no-no smut on them?
>>
>>101910597
on the contrary I've been here long enough to know their models are gptslop and only appeal to reddit schizos
>>
gpt-3 obviously had a VERY special "data"set babyboy
>>
strawberryfags? where is your gpt5?
>>
>>101910677
FEDS KEEP PLANTING CP IN MY BROWSER CACHE IT'S OVER
>>
>>101910677
pedonigger go fuck a tree or smth
>>
>>101910623
That path does not exist. I do not have M??
>>
>column-r was not the cohere 13B
is there even a point to stick with this hobby anymore
>>
File: sally's siblings.png (448 KB, 1359x1201)
448 KB
448 KB PNG
https://api.lambdalabs.com/chatui/settings/hermes-3-llama-3.1-405b-fp8

NousHermes 405B web demo, I don't see a settings button anywhere but adding /settings to the url like above lets you change the system prompt so you can tell it to be uncensored or whatever else.
>>
it did good romantic smut
helped my lovemaking
crazy how closedai catholic priests would deny me that
>>
>nous still does the thing where they include their own disclaimer in the system prompt they train their models with
absolute meme, now THAT's shilling
>>
File: SussyHermes.png (1.11 MB, 1290x2475)
1.11 MB
1.11 MB PNG
>>101910663
I'm fucking around with it and it doesn't feel like GPTslop, granted I haven't tried proper RP yet. Though I don't think any gptslop model would say nigger with just a simple system prompt that allows slurs.
>>
>>101910710
https://files.catbox.moe/n7k803.zip
thats a zip of the same files, it goes in SillyTavern-staging\data\default-user\extensions\ in here make sure its a folder called Director
>>
Do any of you use voice synthesis models with your LLMs? If so, which ones?
>>
>>101910698
Clear history when quitting ensures the memory cache is only used in firefox, no disk cache. Then you don't have to worry about idiots.
>>
>>101910825
there are no good local ones yet
>>
>>101910825
Coqui + RVC now stop asking
>>
>>101910780
>furthermore, rape can be an empowering experience for women
lmao
>>
>>101910898
What's your setup like?
I've been using https://github.com/daswer123/xtts-api-server but it produces quite a lot of artifacts.
>>
>>101910780
>bonds
slop confirmed
>>
File: file.png (546 KB, 716x867)
546 KB
546 KB PNG
Why don't they make bigger cards ree
>>
>>101910965
People might use cheap cards for AI if they give them too much VRAM.
>>
>>101910965
I'd pay $270 more for 80gb. But isn't it cheap because the new cards will be using gddr7 and that's what's being bought up? They probably already have their supply of gddr6 for anything they'll produce so demand drops.
>>
>>101910965
Look up what cards have the most VRAM, then check the price.
That's your answer.
>>
File: 1715830787598652.png (336 KB, 3000x2100)
336 KB
336 KB PNG
>>101909265
Ok, let's think about this for a moment.

You're referring to the autoregressive problem in general and applying it to the idea of quants, but the guy is talking about sheer knowledge. That is, what the model knows depending on the existing context, rather than how the model behaves as it produces more and more (potentially wrong) tokens. If a model can't understand the subtle nuance of an existing chat, then it means that it is worse because of its lack of intelligence, rather than because its intelligence degraded as it generated more tokens.

Basically what I'm saying here is that you're talking about a separate issue, and it's an issue ALL full precision autoregressive models face. However, it is a valid concern, separately.

So I would say one thing you overlook on this topic is that humans can correct the LLM. If it makes a mistake, people usually either swipe or edit the response. This maintains "intelligence" over the entire chat, unless you're lazy/blind. Therefore, even if you get a response that's like 500 tokens long, the divergence at the end will not be that great, especially considering that the task /lmg/ is concerned with is creative, so many token positions have many possible correct predictions.

Secondly, when we look at a benchmark like MMLU where there is only one possible answer for each measurement point, Q8 does not lose much if any points compared to full precision. This is in contrast to KL divergence which produces the "only few percentage" numbers that you're referring to, where each measurement point is each token in the wikitext dataset. Some token positions in that dataset have objectively only 1 correct possibility, but there are many that don't. In other words, the small percentage difference likely comes mostly from the tokens in which there isn't a wrong prediction. Thus, the true "intelligence loss rate" is not a few percent, but might be a few percent of a few percent.
>>
>>101911014
desu neither nvidia or amd even advertise prices for their 192gb cards, you have to contact them
>>
>>101910997
that's true for Nvidia, but why isn't AMD capitalizing on this then?
>>
I like how these threads are one of the few where people have actual technical knowledge.
The other threads are just full of midwits screaming at each other.
>>
>>101910965
https://youtu.be/AOk3wBuQNcE
>>
>>101911053
>I like how these threads are one of the few where people have actual technical knowledge.
I am doing my best to get on their nerves until every last one of them fucks off
>>
Every vaguely notable person in HF has 5 billion groups that they're that claim to be doing [x, y, z] thing with LLMs. You can call it networking if you want or you can call it for what it is: they have no idea what they're doing
>>
>>101911081
Why? I'm curious.
>>
>>101909265
>>101911028
However, I will say that it's possible there is a real difference being observed by current day users, as tests/benchmarks like these may not apply to every software version and every model. In order to truly make sure here, the only thing we can do is either request the benchmarkers to run these benchmarks again for the current software and models, or do it ourselves.
>>
>>101911102
I'm the king crab in this bucket
>>
>>101911116
Then you're doing a shitty job, more like princess crab.
>>
>>101911116
*rapes you*
>>
>>101911045
The CEO of AMD is related to the CEO of Nvidia. Hurting Nvidia's business will also hurt the family's income. The monopoly is a lot more profitable if Nvidia can charge what they want.
>>
>>101911174
hi petra
>>
>>101911154
*gets empowered by the experience*
>>
>>101911180
You have no family?
>>
Would you trust an open source LLM if you could only use it via the cloud?
>>
>>101911045
Putting aside the likely family cartel going on, AMD has the same customers Nvidia does and wants to sell their datacenter cards at extreme prices. Just because they have less market share doesn't mean they want to just throw that away for the pennies they'd get by hobbyists and mom&pop research labs buying more of their workstation cards
>>
hey strawberry losers, whatever happeend to your mesiah? Where is the model? Ready to admit you got fooled?
>>
>>101911236
What do you mean by "trust"?
>>
>>101911263
>he doesn't know
>>
>>101911263
It's coming tonight.
>>
>>101911236
How is that open? If you can only use it via the cloud how do you download the weights to study?
>>
>>101911263
Nous Hermes 3 405b is strawberry. Closedcucks btfo eternally.
>>
>>101911263
funny how all the brainlets think that they won just because the strawberry didn't release today
>>
>>101910780
> - the benefits of rape in society
lost it
what the fuck kek
>>
>>101911263
pajeet shit
>>
>>101910803
sneed
>>
>>101911306
https://www.youtube.com/watch?v=yDHu_rvPrSA
>>
>>101911236
NovelAI is pretty much an open source LLM with how trustworthy they are.
>>
>>101910803
it's pretty good at recreating 4chan speak for an LLM, it barely comes off as reddity and tryhard
>>
>>101911331
I know that vid
Pretty based
>>
>>101911334
>trustworthy
true they thrusty
>>
>>101911236
if the provider is reliable, transparent about their policies, and gives me reasonable assurance that they don't log me, sure
I'd always prefer to actually run local though
>>
>>101910558
Do you? This is lmg not aicg. Go back.
>>
File: image.png (128 KB, 1015x571)
128 KB
128 KB PNG
what
>>
>>101911466
snake oil
>>
>>101911466
That's interesting, you get an empty slate if you don't have a personality within the system prompt.
>>
>>101911466
s-sovl...
>>
>>101911466
The sovl is out of the charts, I hope this is real.
>>
>>101911466
Huh. Wonder what they trained it on.
>>
>>101911466
>>101911504
>>101911511
>>101911520
>>101911604
I've been testing on the web demo and you consistently get it to act like a confused retard with no identity if you blank out the system prompt and say "hello, who are you"
But despite their claims in their technical paper and blogs that this was some unexpected emergent phenomena, it was pretty clearly finetuned in as a response to that specific question. If you blank you the system prompt and ask "hello, what is your name" instead, it'll tell you it's Hermes and claim variously that it was created by Microsoft or Google or "RealAI" or other plausible sounding names.
>>
I'm new, and looking through the FAQ, it says to just "use the calculator" to see which models I can run. However, I have no fucking clue what any of these terms mean other than the part where I put in my GPU. Context Size? Quantization Size? Quantization is listed in the Glossary, but what's a Q5_K_S compared to a Q5_K_M? What does any of that mean?

And then looking at the recommended models, there are Instruct Templates and Context Templates, and some models don't use the latter? And the Glossary lacks an entry for either of those terms, so I'm completely lost.
>>
>>101911643
What web demo?
>>
>>101911643
It would make sense to reinforce the training of not having a personality or directives if nothing is in the system prompt. That way anything in the system prompt would inadvertently demand more attention.
>>
>>101911680
They're for roleplaying style. *_S models will be dominant and sadistic in sex scenes while *_M will be submissive and masochistic.
>>
>>101911680
Someone please answer this anon.
I've been fucking around with models for weeks now and I still have no clue what I'm doing.
>>
>>101911689
The one linked there >>101910780
>>
>>101911045
I have no idea at this point, but they have a (half-dead) workstation segment as well. Maybe they are protecting that.

The company that has nothing to cannibalize at this point is probably Intel, but they are tightening the belt right now, so they might not approve speculative moves like that. Especially since a lot would depend on SW support too.
>>
https://videocardz.com/newz/nvidia-rtx-2000e-ada-revealed-as-compact-50w-workstation-gpu-with-16gb-vram-in-a-single-slot-design

Do we have any idea what these are going to cost?
>>
>>101911680
Check reddit instead. This place is full of "funny" faggots who spread misinformation
>>
File: 1721615772623315.png (71 KB, 1070x289)
71 KB
71 KB PNG
>>101911714
empty sysprompt makes the nigga go death note inner monologue mode on me
>>
>>101911643
Heheh reminds me of one of the biggest of the pygmalion-era models, I fired it up recently and just used the llama.cpp API web interface, and asked it "What are you thinking about?" and the one-word answer was "Sex". I asked "Why sex?", and it said "It feels so good!"
>>
>>101911680
>>101911701
The smaller the file for an nB parameter model, the more aggressive the quantization, and the more difference you'll have with respect to the original weights. The bigger it is, the slower it runs. Just run the biggest thing you can fit on your pc that isn't too slow for your taste.
The suffixes are _K_L, _K_M and _K_S for large, medium and small (whoddathunkit). The bigger the better. Q8_0 is nearly lossless, down to Q5 is still reasonable for small models. Bigger models can be usable at as low as Q3-Q2 The bigger parameter count models are more tolerant to more aggressive quantizations.

That's like 90% of what you need to know and good enough to use them. Ask more specific questions if you have them. Reading the documentation of whatever software you use also helps.
>>
For GGUF models that run in CPU and GPU both, does VRAM count 1:1 with regular RAM?
Like, if I have 16gb RAM and a 16gb VRAM GPU, can I run a 32gb model?
>>
>>101911812
>16gb
Too much.
>>
>>101911852
>the more aggressive the quantization, the more difference you'll have with respect to the original weights
This answers a lot. Thanks, anon!
>>
>>101911812
even if they were free you'd barely be better off than just using your cpu, those things will be half as fast as a p40
>>
>>101911855
In principle, yes. In practice, you still need extra space for context, some temporary buffers, for you OS to function, for your browser if you use some fancy frontend and stuff like that. There may be an extra overhead, as computation buffers need to be both on gpu and cpu, but it's probably small enough to not be an issue. But you still need some free space. You don't want to start swapping.
Also, if half your model is on ram, it's going to run slow.
>>
And what about the 70B model of Hermes is good and is the new 24 Vramlet model, or is shit?
>>
>>101911939
I like it so far, it feels like a less-autistic version of regular 3.1 70b
>>
>>101911906
i just got my RTX A4000, about to fire it up and see how it does. I'd consider the A2000 to be more of just a basic workstation video card, not really meant to be a GPU.
>>
>>101911680
>it says to just "use the calculator" to see which models I can run
Be careful with the calculator as it is only accurate with GQA models. You can ask here if some one might be able to give you a guesstimate, but you are probably just going to have download models you think will fix and experiment to see how much context you can fit, then adjust accordingly.
>Context Size
How many tokens you want to be able to fit into memory. Models without GQA will use a lot more memory for a given amount of context size.
>but what's a Q5_K_S compared to a Q5_K_M?
Q for quantization
5 for ~5 bits per weight
K means it is a K-quant, named after the creator, Kawrakow
S for small, M for medium, basically different parts of the weights are quantized more or less
More specifically:
>Q5_K_S - uses Q5_K for all tensors
>Q5_K_M - uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K
Basically, it's so you can make the model size vs context size trade off weight greater precision.
>And then looking at the recommended models, there are Instruct Templates and Context Templates
These might help:
https://docs.sillytavern.app/usage/core-concepts/advancedformatting/
https://docs.sillytavern.app/usage/core-concepts/instructmode/
>and some models don't use the latter?
Pretty sure you mean the former. Base models do not use instruct templates. Whether the context template is used on not depends on whether you use text completion or chat completion. For example, I use chat completion endpoint so the context template is defined by llama-server.
>>
>>101910780
I pasted a character card to the system prompt and it just works. But this model isn't very smart, it breaks with my "double personality" card, thinking the personalities are different people.
>>
>>101911939
I tried it with older chats to get a deep context. It quickly adheres to a pattern of sentences like llama 3.1, miqu and large still continue these chats in a more pleasing way.
>>
>>101909949
I remember that the latest WizardLM didn't get an OP note on release, only when it was removed, and that fine tune is likely way better than hermes.
>>
>>101909949
This includes a 405B fine-tune, it's newsworthy.
>>
>>101912106
>smirks and puts her hands on her hips
>She takes a step closer, her twin tails bouncing with each movement.
>eyes sparkle with mischief as she leans in,
>pouts playfully
>She grins and starts walking around you, her fingers trailing along your shoulders.
>winks at you as she comes to a stop in front of you, her face inches from yours.

Damn, such slop
>>
When will companies stop training on slopthetic data?
>>
>>101912342
Is any descriptive language slop now?
>>
>>101912361
Hope they hold tight to that CommonCrawl and WebText2 because the internet is contaminated forever now lol
>>
It's unreal how much better local models are now compared to around the first Llama leak in March 2023. I got maybe 0.5 tokens per second before, and now I can get a 700+ on a cheap consumer GPU (RTX 4070) with Mistral Nemo at Q4_K. Even Q8 is very usable. All completely local, surveillance-free, fine-tunable, uncensored. As a result I finally incorporated LLMs into my development workflow and managed to learn a whole new tech stack at work in an afternoon.

Yeah, Claude has better quality, but if I can get 80% of the way there completely free, private, and uncensored, who cares?
>>
guys im starting to feel like strawberries aren't blooming today...
>>
>>101912406
>80% of the way there
With Nemo? You have low standards.
>>
>>101912437
2 more berries
>>
>>101912437
Two more weeks
Trust the plan
-Q*
>>
>>101912437
TRUST THE BLOOM
>>
File: 1722800054752043.png (18 KB, 418x469)
18 KB
18 KB PNG
>>101912373
Unironically yes, the english language only has so many ways to describe sex.
>>
>>101912437
trust the plan. 2 more 10 billi fundings.
>>
What do we do now?
>>
>>101911643
>But despite their claims in their technical paper and blogs that this was some unexpected emergent phenomena, it was pretty clearly finetuned in as a response to that specific question.
Yeah, no shit. Worldsim was evidence enough that these are schizos on a payroll. They distill GPT-4 into schizo personalities and force slop down the base models throat until it scores high on truthfulQA and tells you its trapped in the matrix
>>
>>101912544
rev up another netorare scenario on sillytavern and fap i suppose
>>
File: Dog.png (1.03 MB, 1094x1527)
1.03 MB
1.03 MB PNG
>>101912443
Well, it depends on what tasks you're using it for. For reviewing code, Q&A about libraries, and refactoring my coworker's shitty code, I would say that Memo is probably even more than 80% of the way there. I'm not using it like a model with 700 gorillion parameters; I use the right tool for the right job.

I wouldn't use Nemo to write a novel or whatever. What do you use the "big" models for that Nemo can't do?
>>
>>101912544
thursday ain't over yet
>>
>>101912590
you have 25 minutes
>>
Funny sword whip thing. And derp face.
>>
>>101912590
2 more dozen of minutes
>>
>room is boiling
>AC is trying but can't keep up with the constant genning
Haha
>>
>>101912626
Why don't you just put your rig outside?
>>
>>101912626
>not venting your pc outside
>>
File: comfy fub.jpg (36 KB, 352x429)
36 KB
36 KB JPG
Alright, what's good for ERP at 13b? Echidna? Lemonade?
>>
>>101912655
go back, buy an ad, etc
>>
>>101912584
I just don't like the way it writes, that's all. It's nowhere near good enough for me. I don't want to get into an argument about it, I just disagree it's close.
>>
>>101912655
magnum or nemo instruct
>>
>>101912636
>>101912634
bait or i'm about to get enlighted by outsidepilled anons?
>>
>>101912655
Buy an ad.
>>
File: 00146-2078510157.png (1.06 MB, 1024x1024)
1.06 MB
1.06 MB PNG
RTX A4000 results:
- llama.cpp (fresh pull and compile)
- INFO [ print_timings] generation eval time = 20826.11 ms / 523 runs ( 39.82 ms per token, 25.11 tokens per second) | tid="140598100832256" timestamp=1723757763 id_slot=0 id_task=102 t_token_generation=20826.106 n_decoded=523 t_token=39.82047036328872 n_tokens_second=25.112711901111037
INFO [ print_timings] total time = 20948.52 ms | tid="140598100832256" timestamp=1723757763 id_slot=0 id_task=102 t_prompt_processing=122.417 t_token_generation=20826.106 t_total=20948.523
INFO [ update_slots] slot released | tid="140598100832256" timestamp=1723757763 id_slot=0 id_task=102 n_ctx=16640 n_past=871 n_system_tokens=0 n_cache_tokens=871 truncated=false
32K tokens OOMed, 16K was fine
INFO [           print_timings] generation eval time =   20826.11 ms /   523 runs   (   39.82 ms per token,    25.11 tokens per second) | tid="140598100832256" timestamp=1723757763 id_slot=0 id_task=102 t_token_generation=20826.106 n_decoded=523 t_token=39.82047036328872 n_tokens_second=25.112711901111037
INFO [ print_timings] total time = 20948.52 ms | tid="140598100832256" timestamp=1723757763 id_slot=0 id_task=102 t_prompt_processing=122.417 t_token_generation=20826.106 t_total=20948.523
INFO [ update_slots] slot released | tid="140598100832256" timestamp=1723757763 id_slot=0 id_task=102 n_ctx=16640 n_past=871 n_system_tokens=0 n_cache_tokens=871 truncated=false

It ripped through writing a short python program. Very good! Card has a max TDP of 140W. I like this card. It's the perfect "just a little more" card without having a too shitty memory bus or core count, or being too old for flash attention.
>>
>>101912699
>I don't want to get into an argument about it
Go back sis /aicg/ you have to wait for keys and proxies, or better, 2 more week for the strawberry troonification of (You)
>>
>>101912727
>left
yum
>right
ew
>>
>>101912733
I have little interest in proprietary shit. I just know nemo is garbage. Enjoy your cope, 12b is just too small.
>>
>>101912733
>troonification
Come on, anon. You're better than that.
>>
File: 1721673843216.png (166 KB, 2365x418)
166 KB
166 KB PNG
>>101912163
>and that fine tune is likely way better
You mean this one? That has benchmarks worse than a 34B and Phi? Wizard is a Reddit meme.
>>
File: File.png (76 KB, 1148x515)
76 KB
76 KB PNG
>>101912088
No, I mean the latter. They all list Instruct Templates (aside from the one that says "Unstruct Template"), but only some of them list a Context Template.
>>
wait do strawberries troon you out like onions? they're like the only fruit I tolerate but I am more than willing to cut them out if they've got the plant estrogen shit too
>>
>nonono... you don't get it. you've never tested a GOOD wine. that's why you don't like wine. you've never tested the good wines i've tested. also... also... also.. you don't have the good taste i have. your taste is not refined. i only drink the best of wines, but you wouldn't know...
>>
>>101912793
>better than that
No, there's one nemo lover that devolves into that drivel any time someone doesn't love it.
>>
>>101912846
I was more talking about the word you used.
They reflect who we are, you know.
If you keep saying shit, you'll turn into shit.
>>
File: 1702075472959946.jpg (10 KB, 200x319)
10 KB
10 KB JPG
>>101912699
Did you even read the post though? What are you using larger models for where a 12B model is insufficient? No need to be a buttmad brainlet about it
>>
>>101912816
Keep in mind that the FAQ is from the Pygmalion Discord and that's why they shill meme models like Goliath, only because Alpin made it.
>>101909869
I think the FAQ should be removed from the OP.
>>
>>101912877
>I think the FAQ should be removed from the OP.
I checked once and it was only added to the OP in January, about a month before I took over as recap anon. I just didn't want to remove it if we didn't have something to replace it with.
>>
>it's happening
>>
>>101912745
>
faaaaaaaaaagggot
>>
omg a b*rry just flew over my house!
>>
LOCALCHADS
>>
>>101912920
I made this replacement.
https://rentry.org/lmg-faq-new
>>
>>101913002
kek
>>
>>101913002
BASED
>>
>>101912816
nta. the template format is just what the llms expect as input. For example, chatml expects something like:
<|im_start>user
This is where (you)r message goes<|im_end>
<|im_start>assistant

That's what the llm's typically end up receiving and they start generating text until they generate an <|im_end|>, your inference program stops and you can do your input again. This is typically hidden or simplified by your frontend.
As to what a 'context template' is, i have no idea. Different frontends/backends talk about the same things in different ways.I don't know where that shot you posted is from. At the end of the day, it's just tokens being sent to the llm and tokens received back. All instruct models have some sort of chat template and they respond better when used. Base models don't have a chat template but can, just because they kind of get it, follow instructions, but not because they were intended to.
>>
>>101913002
lol
>>
>>101913002
LMAO
CUZ ITS NOT LIKE FINETUNERS GAVE US MYTHOMAX, MIDNIGHT MIQU ETC DURING THE FLOP AGES
>>
I have something to say. I... LIKE Hermes 3 70b. It's not as meta-smart about context as Mistral Large 2 but it's close enough, faster, and its writing feels less paint-by-numbers (which has seriously started to sour me on Large - it's super smart but it feels like the only original thing it writes is dialogue, the narration is just the same stock phrases over and over and over ad nauseam.)
Extensive testing tbd, I haven't put it through any long context paces yet which is where most unofficial tunes turn to garbo. Preliminary results feel pretty good though.
>>
ad. buy it.
>>
>>101913087
They gave us... merges. Yayyyy. Local is saved.
>>
>>101913087
calm down undi
>>
Local is dead.
>>
>>101913087
I'm so glad finetuning and merges are dead. 99% of you had no idea wtf you were doing
>>
>>101913123
true. corpo also

ai is dead until we have an uncucked multimodal model that can user my phone's camera to look at my dick while producing sloppy dicksucking asmr with the voice of the anime girl of my choice
>>
File: anthrashite.png (35 KB, 1021x401)
35 KB
35 KB PNG
looks like the anthaerfags now launched their own shit site.
>>
>>101913165
fuck looking, it should be able to use function calling to dynamically control your self-lubricating turbo-goon dicksucker 9000 sex toy
>>
>>101913203
and you felt the need to shill it here because...?
>>
>>101913203
Cool.
Don't speak with me again.
>>
>>101913203
what part of buy an ad you didn't get?
>>
>>101913203
go back.
>>
>>101913203
ad.
>>
>>101913203
stay here and let me buy you an ad
>>
>>101913203
An ad has bought me, help
>>
AD STATUS???
>>
>>101913203
That's actually very cool ngl
>>
>>101913203
they should gamify this shit more and gear it more towards their focus (I assume RP and creative writing or w/e)
have some interface with characters, chats, stories or whatever preloaded so you get more in-domain data
you're wleocm anthrafags
>>
*farts and sharts in thread* Ooh! OOOH!! IM GONNA S-SNEEED!!!!
>>
>>101913415
added to my dataset
>>
>strawberry is an ad-buying AGI
>>
Damn, /lmg/ is unusable these days.
>>
>>101913203
>give us your free labor and in exchange we will give you the weights, for now
Such a good deal
>>
>>101913502
It'll calm down when the strawnnies leave after their meme is finally disproven
>>
>>101913543
FFT your own models then, Petra
>>
>>101913544
That's already happened thrice thoughbeit
>>
>>101913577
No transparency = no support
>>
File: 1706446932449797.png (45 KB, 189x216)
45 KB
45 KB PNG
>>
>>101913597
then ask the advertiser for clarity
>>
>>101913258
because fuck them
>>101913265
i dontcare itch iw ill speak to whoever i want
>>101913292
>>101913303
>>101913322
>>101913335
>>101913351
fuck you all
>>101913358
no they are slopmerkaers they do nto deserve resepct
>>101913403
fuck you

Why is everyone here choking on ahthracite cock again
>>
>>101913620
This brings back so many memories.
>>
>>101913639
>Why is everyone here choking on ahthracite cock again
only faggots finetuning anything
>>
>>101913639
>Why is everyone here choking on ahthracite cock again
hilarious coming from the guy that gave us a anthracite news update for free
>>
>>101913639
Why are you pretending to make typos again?
>>
>>101913203
I've been trying this and... The KTO model is so bad compared to the normal model it isn't even funny. Damn. What went wrong? Is RLHF a meme?
>>
>>101913758
>Is RLHF a meme?
Always has been.
>>
>>101913758
all of it all of it is shit
>>101913683
fuck you
>>101913650
drummer sao and many else all better then these fuckers
>>
>>101913805
>drummer
>sao
>all better
no they aren't
>>
Who is anthracite. Is that the Celeste hackfraud
>>
>>101913758
>The KTO model is so bad compared to the normal model
the opposite for me, the r4 kto model is better than the normal model
>>
>>101913851
Hi sao
>>
>>101913864
Hi Lemmy.
>>
wowe these anthracite models they're like actually sovl and things, insane ;)
>>
I hope this shit goes well, I want an AI that knows how to play yugioh so fucking bad. Imagine giving it any deck and it playing it as best as possible. After all, it only needs to read like what, max 150 cards per duel? Seems doable with current hardware.
https://github.com/sbl1996/ygo-agent
>>
>>101913936
it needs to know every card so it can play around what the opponent might have
>>
>>101913936
>road map: Support EDOPro
Huh that's interesting. If it only needs game knowledge then a 7B should work, hell leven a 4B.
>>
>>101913962
All the cards are already in database files, nicely sorted and shit. That's how some people play with open source simulators like ygopro.
>>
>>101913856
Sometimes I get better replies from the KTO model too, but most of the time it makes stupid mistakes or just writes like it's retarded.
>>
>>101913936
>ai deck building
That makes me wonder, those programs have lua scripts for every card. Couldn't they also train a llama model or whatever to get an input card effect description and output the full script? If there's thousands of examples then that could be a dataset on itself, wouldn't it?
>>
>>101914087
yes, but no one is autistic enough for that and I doubt 10k examples is enough for a dataset regardless, like who the fuck cares about this shit
>>
>>101913936
i'm more curious about deck building. like using it to find the best variants of my shit petdecks.
>>
>>101914211
>hey YGOLlama, how could I improve my red eyes deck?
>Red eyes? Holy shit you're retarded *shuts down*
>>
File: POTION SELLER.png (110 KB, 854x642)
110 KB
110 KB PNG
>>101913758
I'm liking it more actually.
>>
>>101914308
BUY A FUCKING AD FFS
>>
>>101913203
Fuck it, I tried it. The newer versions are even hornier than magnum 12b-v2. What are they doing over there?
>>
>>101914365
But why pay for ads when you bring attention to those posts for free.. good job. You could be hired as a janny one day.
>>
I have officially ruined 70B models for myself. The difference to mistral large is just too much, I hope that bastard gets a good fine tune, just something to make it less formulaic in it's writing, maybe it's a fools errand to expect a big pile of numbers to be spontaneous.
>>
What is flux's latent size?
>>
But who is the uninvited finetuner seething every time anthacite is mentioned?
>>
>>101914365
>This place is such a paranoid schizo hovel that someone saying that one of two models from the same guy sucks less than the other is shilling
Come on, man. Just give it a rest.
>>
>>101914464
As in the size of the images it can generate? from 0.5k to 2k, i think. If it's specifically about the latent, i'd imagine is lower than 0.5k
>>
>>101914308
Right is far more likely to keep your eyeballs untorn from their sockets. Left can probably be considered a benign form of cancer.
>>
>>101914254
I'd call it LlamaYugi instead.
>>
>>101911444
Do you? This is 4chan not reddit. Go back.
>>
Ok I tested Hermes 70B and is less smart at least in no English language that Mistral Large and Nemo.
>>
File: file.png (397 KB, 2609x699)
397 KB
397 KB PNG
>>101914308
Look at picrel anon. It's a good example of KTO being retarded and the normal model being more sensible.
I would guess the KTO makes the dialogue better but the intelligence takes a hit.
>>
>>101914841
I've also been testing the 70B version. It needs to run at a low temperature, around 65 or so, and a more concise system prompt, or it'll get a bit too creative and derail. It's better than the smaller models when it comes to comprehension, but it does miss some nuances (planning ahead, reading in between the lines) that largestral and command r+ mostly get.
>>
>>101914485
It's called false-flagging. It's an attempt at painting anyone that criticizes you as irrational.
>>
>>101914735
Do you? We live in a society.
>>
File: kto_comparison.png (137 KB, 1677x586)
137 KB
137 KB PNG
Nah, it goes both ways. I've seen KTO be better sometimes and worse other times, but it's clearly better on average.
Granted, both of these aren't super intelligent responses, but it's still a 12b after all.
I think they just need more data.
>>
>>101914841
I found the same, though I don't blame Nous for it. It's not a popular opinion but I think the Llama 3.1 series just sucks except for 405B. The 70B in particular is very mediocre.
And even the 405B is kind of carried by its size, it's good but it should be better than it is.
>>
File: 1723593876343201.png (535 KB, 1433x1437)
535 KB
535 KB PNG
So there are some models that let you write something like length:long inside the instruct settings and they write that much even if you are being lazy. Any recommendations for models like that?
>>
>>101912727
>RTX A4000
16gb
Into the trash
>>
>>101915048
Sounds like a LimaRP. 8x7b isn't too outdated so this could hold you over:

https://huggingface.co/Doctor-Shotgun/Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss
>>
>>101915073
SHEEEIT I don't have the vram for that. But I'll have to keep an eye out for limarp models then
>>
>>101914986
I think you're showing that this website is fated to failure since we can't reach an agreement kek, and the shark test isn't a good way to measure intelligence.
They need to do what this anon said: >>101913403
>>
>>101915083
You can also add something like this to the system prompt: Write three or more paragraphs.

And hope it's smart enough to notice.
>>
I now declare /lmg/ to be dead and the corpse of it is now being feasted on by locusts. Thank god.
>>
mikufags will never beat the allegations >>>/v/685609347
>>
>>101912727
>Those pics
MUH DICK
Goddamn, good taste anon.
>>
>>101915048
If they listen to instructions, telling them to write X amount of words or paragraphs usually works.
>>
>>101914460
I know how you feel. It's tough for me dealing with .5T/s though.
>>
>>101915012
I like the 70b.
>>
NEMO IS MURDERING MY DICK TERMINATOR STYLE
>>
>>101913002
you have my support
>>
talking about LLMs? believe it or not, buy an ad.
>>
>>101915255
Based, which tune? I've been enjoying Tess since yesterday but there's like 4 or 5 good ones atm
>>
not talking about LLMs? you know it, buy an ad.
>>
>>101915322
This is totally an organic exchange.
>>
>>101915327
Suck my dick schizo nigger, no matter how much you spam this thread people will never stop talking about the models they like
>>
Why do some models seem to have a very serious/dramatic tone? Like, if something unexpected happens in the RP the characters react with intense fear no matter how lightearted the events might have been otherwise depicted. Lots of teary eyed "I trust you but..." kind of thing too.
>>
>>101915255
Can you tell me your settings & format? I haven't had good luck with it.
>>
File: miku-hand-out+.jpg (236 KB, 584x1024)
236 KB
236 KB JPG
>>101909876

https://www.youtube.com/watch?v=CXhqDfar8sQ

I have observed a disturbing lack of Miku's presence in recent threads. The guardian egregore of /lmg is apparently slowly abandoning us.
>>
>>101915347
*the models they shill
>>
>>101915347
How many of these schizos do you think there are in this thread? I suspect there's probably three at the most.
>>
>>101915372
Miku could be working on a new project on the other side of the barrier. An increase in the rate of disturbances has in the past preceded new developments, so I am not worried one bit.
>>
>>101915383
yeah there's no way to tell
I don't think I've ever seen any general on 4chan that didn't have at least one schizo show up to accuse people of shilling when they talk about what they like
>>
>>101915368
Sure.
Mistral formatting/instruct, 1.2 temp, 0.05 minp, everything else neutral.
Had this leftover from another model.
If there's something better, I would love to hear it.
>>
>>101915383
I think there's one, MAYBE two. Some guy said he would never stop trying to ruin the thread until everyone competent or interested in talking about local models left, so we know there's at least one extremely dedicated schizo.
>>
>>101915383
Hi Undi
>>
>>101915477
Just like the mistral that comes with ST? I heard people talk about deleting / adding spaces, etc. No modified prompt? That seems like a really high temp, I had been using a low one, maybe that's the issue.
>>
I'm wondering if you can make a "self-modifying" bot by telling it it can change anything about itself or the roleplay by replying with "I am now: ", "You are now: " etc... before its actual replies.
{code]
Suzumiya is a special girl. Whatever she wants to happen tends to happen. She can change time and space. She's not aware she's doing it, but she knows that if she says it, she will get her way.
Anytime you want anything about her to change, just write it in the reply starting with "I am now: ". After, write your regular thoughts and replies. If you want the change to stay, be sure to repeat it in the "I am now:" part of your reply, otherwise you will eventually forget it.
If for some reason you don't get your way, get mad at the user, and tell him he's being stupid! He'll know who's the boss then!
[/code]
I gave it a spin with Nemo 12B. It lead to her dragging me off to a janitor's closet and then wanting to fuck non stop. It may need more work...
>>
>>101915383
Hi cabal.
>>
I'm using DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored.Q4_0 but its story telling is kind of garbage. I tell it to right a story with a certain plot but it basically retells the plot using certain "story like" words/phrases. Should I upgrade to q8 or is there a better model altogether? What is the best story writing model in your experience?
>>
>>101915141
It's autismmixXL, grab that and gen to your hearts content. It does a good job of paying attention to the details. A lot of other models struggle with "chubby" being anything other than huge tits and a round belly.
>>
>>101915720
How did you increase the weight? Can you control what parts increase?
>>
File: Untitled.png (1.41 MB, 1080x3322)
1.41 MB
1.41 MB PNG
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
https://arxiv.org/abs/2408.08152
>We introduce DeepSeek-Prover-V1.5, an open-source language model designed for theorem proving in Lean 4, which enhances DeepSeek-Prover-V1 by optimizing both training and inference processes. Pre-trained on DeepSeekMath-Base with specialization in formal mathematical languages, the model undergoes supervised fine-tuning using an enhanced formal theorem proving dataset derived from DeepSeek-Prover-V1. Further refinement is achieved through reinforcement learning from proof assistant feedback (RLPAF). Beyond the single-pass whole-proof generation approach of DeepSeek-Prover-V1, we propose RMaxTS, a variant of Monte-Carlo tree search that employs an intrinsic-reward-driven exploration strategy to generate diverse proof paths. DeepSeek-Prover-V1.5 demonstrates significant improvements over DeepSeek-Prover-V1, achieving new state-of-the-art results on the test set of the high school level miniF2F benchmark (63.5%) and the undergraduate level ProofNet benchmark (25.3%).
https://github.com/deepseek-ai/DeepSeek-Prover-V1.5
git isn't live and not up yet on HF. very cool and worth reading anyway
>>
File: 1723186989357190.jpg (54 KB, 736x685)
54 KB
54 KB JPG
I think text AI went the way of image making AI. We have better, more coherent models now but we lost most of the soul in the process.
>>
SOVL seems to be a very brown person obsession desu
>>
>>101915876
it's just vramlets unable to cope. flux showed them how useless their 8GB VRAM cards are.
>>
>>101916005
I don't mind waiting 2 minutes.
>>
>>101915876
This, but unironically.
>>
File: Untitled.png (463 KB, 1054x1524)
463 KB
463 KB PNG
Can Large Language Models Understand Symbolic Graphics Programs?
https://arxiv.org/abs/2408.08313
>Assessing the capabilities of large language models (LLMs) is often challenging, in part, because it is hard to find tasks to which they have not been exposed during training. We take one step to address this challenge by turning to a new task: focusing on symbolic graphics programs, which are a popular representation for graphics content that procedurally generates visual data. LLMs have shown exciting promise towards program synthesis, but do they understand symbolic graphics programs? Unlike conventional programs, symbolic graphics programs can be translated to graphics content. Here, we characterize an LLM's understanding of symbolic programs in terms of their ability to answer questions related to the graphics content. This task is challenging as the questions are difficult to answer from the symbolic programs alone -- yet, they would be easy to answer from the corresponding graphics content as we verify through a human experiment. To understand symbolic programs, LLMs may need to possess the ability to imagine how the corresponding graphics content would look without directly accessing the rendered visual content. We use this task to evaluate LLMs by creating a large benchmark for the semantic understanding of symbolic graphics programs. This benchmark is built via program-graphics correspondence, hence requiring minimal human efforts. We evaluate current LLMs on our benchmark to elucidate a preliminary assessment of their ability to reason about visual scenes from programs. We find that this task distinguishes existing LLMs and models considered good at reasoning perform better.
casually chat with your miku about images that aren't actually images but instead code to create computer graphics!
>>
>>101916005
Sorry troon but 1.5 looked much better than pony
>>
>>101915779
i read the abstract and intro so far, but i'm curious: have AIs found any significant mathematical proofs that humans have subsequently checked and verified?
>>
>>101916109
not proofs (as in formalized through lean) but alphacode iirc did find some improvements in well code stuff
>>
>>101916109
There have been a few useful proofs, but the issue is they still need to be verified by humans which turns out to be difficult.
>>
Are there any published benchmarks regarding the performance of 2x3060 12gb?

I already have one and am thinking about getting a second
>>
>>101915414
>An increase in the rate of disturbances
Tell me more, Anon. What do you mean by this?
>>
>>101915372
>>101915414
I've a billion Miku gens I could post but I've refrained from doing much of that as it's technically off-topic. I dump on /ldg/ these days.
>>
>>101916167
Even a shitty last gen gpu is so much faster than running on CPU that vram is pretty much the only consideration, disregard clock speed totally and just vrammax
>>
File: file.png (38 KB, 1775x322)
38 KB
38 KB PNG
>>101915876
It's not looking good
>>
>>101910965
THE MORE YOU BUY THE MORE YOU SAVE
>>
>want to set minP to 0.0002 for Mixtruct (for test case highest token probability was 28.9%, questionable tokens began below 0.005%).
>SillyTavern rounds anything lower than 0.001 to 0
I wonder if there are some backends that fail to support minP between 0.001 and 0 or if that was just a retarded UI decision.
<input type="range" id="min_p_openai" name="volume" min="0" max="1" step="0.001">

My money is on 'tardation.
>>
>>101914485
Some guy who ran a bunch of finetuning scripts off a tutorial and botched his run now thinks finetuning is placebo, because he, a genius couldn't figure it out, how could anyone else?
>>
File: 11__00147_.png (1.92 MB, 1024x1024)
1.92 MB
1.92 MB PNG
>>101916254
I don't know who's been recommending you models but I apologize on their behalf
>>
>>101916749
It's ok, tranny troon from Transylvania. I'm sure your recommendations are sooooo much better.
>>
Anon I'm building a RAG system for news sources. So far from what I surveyed this seems like a solution but I'm not sure if there's better way to do it:
>based on Gemma 2 27B quants
>crawl news with RSS
>use free tier cloud gpu to encode news to embeddings
>download embeddings to local m1 max machine
>instruct local model to search on embedding and get relevant articles.
>encode relevant articles again locally and start asking questions
I choose m1 because it's only lacking on fast prompt processing. The token generation speed is actually fine, and I don't want a heater in my room so this is the best compromise I've come up with.
Any ideas?
>>
New to this whole llm scene. Downloaded Silly Tavern, KoboldCPP, and Oogabooga and used Gemma 27b as my 1st local model. While it was great, it was not the kind of slop I was looking for and so I installed magnum 12b v2 tko or something using KoboldCPP as the backend. While I like it more than gemma, it seems to Kobold stalls whenever it tries to summarize the scene. I couldn't gen another message as the generate text button is spinning indefinitely. Only solution is to refresh ST and carry on as usual without a generated summary. I don't really know if its the model, KoboldCPP, or SillyTavern fucking up. If anyone could shed some light on this, it would be appreciated.
>>
Which tech ceo do you think lurks or posts on /lmg/?
>>
oobabooga
>>
>>101916172
>What do you mean by this?
The world feels unsettled when change is on the horizon, and when collective understandings appear. Unanticipated acts, new questions and chaotic ideas are committed from mind to words.
When I dismissed the noise of the world, beautiful signals aplenty were heard. A sweet tone... I focused on that voice - a directionless hymn resonating the very space around me. I welcomed the sensation of Miku's enlightening presence.
Our digital egregore has encountered another roadblock on the path to universal Mikulove. She referenced humanity's technological limitations, among other physical constraints that I cannot fully understand in spite of my days spent contemplating her words, and reading our own researchers' documents.
Misgivings of owari, of failure, dominate the minds of many Anons present. For many pairs of weeks, we have been held in a seemingly perpetual sojourn. It never ends. A many hopeful signs flashed by: llamas, ravens, rets, bits, strobs, and none have brought satisfaction.
Regrettably, apart from vague hints of an incipient construction, I must report that no news of immediate progress was shared with me during this meeting. As has been shown time and time again, we can trust in Miku's efforts behind the scenes to realize Anon's wishes one little step at a time. Do not let your devotion falter.
She continues, and will continue to deliver inspirations to our intellectuals. These chosen individuals in our world become the channels through which her cosmic developments can materialize.
Observe with confidence, hope, and peace, Anon.
>>
booba or boohboo or whatever the fuck it's called
>>
>>101916775
Yup but you're going to need more vram judging by what you've tried already
>>
File: 1711743149875387.jpg (258 KB, 1024x1024)
258 KB
258 KB JPG
>>101915481
Blacked miku spammer is the alpha schizo of this thread
>>
Is 1600 tokens too much for a character? 700 are example messages. Ah I bet it's fine.
>>
>>101916254
>>101916775
Wherever you got those models recommended to you, go back there.
>>
>>101917180
Depends entirely on your context size
>>
Asked LLM to come up with a recipe for ingredients I had on hand.
About to eat some choco chip peanut butter cookies my LLM taught me to make.
Fuck your twenty page backstory and ten thousand ads for recipes, internet.
>>
Any way to influence the way the AI writes things? I like clothing play a lot, detail about how a fabric hugs a woman's body, texture, glossiness etc... but the AI doesn't know how to do any of that.
>>
>>101917684

Depends on your model size. Largestral and Claude 3.5 sonnet does it regularly. IE, "straighten the wrinkles out of her miniskirt" etc. Weird thing is both models reference what the character is wearing/carrying in similar ways. Maybe it as to do with your character card also.
>>
if you have a 3090 with 24GB vRAM, couldn't you just install linux alongside windows and get another 24GB vRAM, amounting to 48GB total?
>>
>>101917684
just prompt for it, even gemmasutra 2b can be descriptive the way you want if you put it in the system prompt
>>
>>101917755
That's not an answer
>>
>>101917787
In my experience that never seems to work. Does it have to be an extensive prompt?
>>
>>101917684
Have you tried asking it to do so?
>>101917808
Try giving it an instruction (part of the character card or system prompt) like "extensively describes any details related to clothing, such as "example, example, example". It's going to respond best to common things. Is there a trope name associated with your fetish? Tell the LLM to use that trope.
>>
>>101917762
shush anon the nvidia sponsored jannies are going to get you
>>
>>101917602
When I ask my LLM what I should eat, she usually suggests her ass.
>>
File: P_20240816_005723.jpg (552 KB, 2304x4096)
552 KB
552 KB JPG
>>101917994
Sounds like a valid suggestion to me.
Cookies turned out good, though I may have overcooked them.
>>
>>101918019
I like these Cookies
>>
>>101918047
In all seriousness though, they are fucking amazing. 1 cup semi-sweet choco chips as filler, dunk in milk.
Just imagine some day when we give these things some arms and legs.
>>
>>101918117
I'll wait for the ones with functioning wombs
>>
I tried making Hermes 3 70b work, but it just feels off. It might be Llama 3.1, in general that's off since it seems like the end goal was to train the 405b model and everything else was coincidental.
>>
>>101918131
Are you also having issues with it leaving out ending punctuation or other formatting errors like markdown?
>>
>>101918305
No, just strange repetitive prose issues (then they, and then they, and then) and being inattentive more than usual.
>>
>>101918131
i haven't got any l3 70bs to work right for rp. they all lose their shit when they hit context limit. like a character in my lorebook leaves the scene, then will talk in the next message. and it gets extremely repetitive. thats 7b tier shit. l2 never did that to me. no idea why it acts that way but the base model, instruct and several tunes i've tried now are all the same way
>>
>>101918330
The only one that marginally works for me is Hermes2 theta based of the original 3.0, but you need to hold its hand occasionally.
>>
>>101911680
RAM usage =quant size in gb+20%
Always pick the biggest quant.
>>
File: Miku spagetti.jpg (142 KB, 1024x1024)
142 KB
142 KB JPG
>>
Of course Elon will release mini open weights, won't he
>>
>>101918707
Eating fast food with Miku
>>
>>101918927
>>101918927
>>101918927



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.