[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1704624856968875.jpg (584 KB, 1856x2464)
584 KB
584 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102058880 & >>102049023

►News
>(08/22) Jamba 1.5: 52B & 398B MoE: https://hf.co/collections/ai21labs/jamba-15-66c44befa474a917fcf55251
>(08/20) Microsoft's Phi-3.5 released: mini+MoE+vision: https://hf.co/microsoft/Phi-3.5-MoE-instruct
>(08/16) MiniCPM-V-2.6 support merged: https://github.com/ggerganov/llama.cpp/pull/8967
>(08/15) Hermes 3 released, full finetunes of Llama 3.1 base models: https://hf.co/collections/NousResearch/hermes-3-66bd6c01399b14b08fe335ea
>(08/12) Falcon Mamba 7B model from TII UAE: https://hf.co/tiiuae/falcon-mamba-7b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png (embed)

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>102058880

--Proposal for "phrase_ban" feature to reduce repetitive phrases in LLM output: >>102060435 >>102060496 >>102060521 >>102060949 >>102060969 >>102061021 >>102061067 >>102063697
--PCI-E lane limitations for 2x4090s on consumer platforms: >>102062320 >>102062351 >>102062476 >>102062625 >>102064244 >>102064290
--Llama-cpp-python allows speculative decoding with draft model: >>102061525 >>102061563 >>102061654 >>102061657 >>102061757 >>102061842 >>102061848 >>102061952 >>102062008 >>102061912 >>102061972 >>102062011 >>102061839 >>102063136 >>102063159 >>102063494 >>102063531 >>102064502 >>102064677
--Llama 405b instruct tune is smart but lacks creativity: >>102059635 >>102059707 >>102059948 >>102060032 >>102060078 >>102060256 >>102060329 >>102060396 >>102064252 >>102064278 >>102064302 >>102064340 >>102064409 >>102060701
--GPU significantly faster than CPU for image generation: >>102059147 >>102059325 >>102059424 >>102059698 >>102059766 >>102059972
--Anon shares limitations of combining SD1.5 and Flux workflows: >>102062823 >>102062865 >>102062867 >>102062882 >>102062911 >>102063076 >>102063101
--Anon asks for mini-model suggestions to improve image prompts: >>102062039 >>102062092 >>102062126 >>102062101 >>102062118 >>102062231
--Anon asks about building a CPUmaxx knock-off with a dual CPU workstation: >>102060749 >>102060797 >>102060892 >>102060919 >>102061322 >>102061344
--Model struggles with doppelganger concept, new approach suggested: >>102063221 >>102063305 >>102063517 >>102063660 >>102063998 >>102064576 >>102065063 >>102065089 >>102065208 >>102065447 >>102067791
--Meta's plans for Grok 3 and massive H100 training: >>102059409 >>102060114
--Abliteration fails to uncensor models, LORA tune debated as alternative: >>102064594 >>102064747
--Miku (free space): >>102059147 >>102061464 >>102061508 >>102064244 >>102066344 >>102066406

►Recent Highlight Posts from the Previous Thread: >>102058885
>>
File: 1704748145903521.png (219 KB, 507x447)
219 KB
219 KB PNG
>>102068974
glad I didn't disrupt recapanon or recapbot <3
>>
>>102068985
she loves black cock
>>
>>102069076
i have american culture fatigue anon, enough
>>
>>102069084
i have mikutroon fatigue. I guess we will both have to suffer.
>>
File: 1547868098714.jpg (29 KB, 690x720)
29 KB
29 KB JPG
>wonder how far along Phi moe support is for Llama.cpp so go and check the issues/PRs
>it's not along at all
ahahaha
Literally no one is working on it.
>>
File: ComfyUI_temp_vyhpt_00072_.png (1.54 MB, 1024x1024)
1.54 MB
1.54 MB PNG
>>
niggers down the spine
>>
>wait since davinci on AID in 2020 for local models to finally be good enough
>it's happened. they are now good enough
>don't care for some reason
thanks for all that wasted time, brain
>>
anyone have recommendations for videos or books on learning neural network basics? youtube is infested with terrible videos from india, or other garbage thats difficult to follow or understand. this guys explanations are usually good
https://www.youtube.com/watch?v=Ilg3gGewQ5U

but i find this shit completely incomprehensible
>>
>>102069670
The hedonic treadmill claims another victim.
>>
Hi lads, what's the best current ERP model for 48gb vram chads?
>>
>>102069895
sorry too scared to answer because schizos will tell me to buy an ad
>>
File: 3YYF.png (50 KB, 840x590)
50 KB
50 KB PNG
>>102069670
LLMs are a meme
>>
>>102069967
Damn that's grim. What are the questions like?
>>
>>102069995
https://simple-bench.com/about.html
>>
>>102069895
Still miqu.
>>
Miku perseveres.
>>
smedrins
>>
https://github.com/ggerganov/llama.cpp/pull/8542
#8542 CPU/CUDA: Gemma 2 FlashAttention support merged
>>
>gemma-2-27b-it still mogs every other model in existence for size/quality ratio and it's not even close
I rely on super structured outputs, keeping track of stats, etc. and gemma is able to keep it together in areas where even 70b models fail.
What is jewgle's secret sauce and why aren't other models using it?
>>
>>102069696
https://www.youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ

https://www.youtube.com/playlist?list=PLfYUBJiXbdtSvpQjSnJJ_PmDQB_VyT5iU

https://www.youtube.com/playlist?list=PLfYUBJiXbdtRUvTUYpLdfHHp9a58nWVXP
>>
>>102068496
>Anyone else like this?
>>transition
Yes, we all love Miku!
>>
>>102069696
https://www.youtube.com/watch?v=kCc8FmEb1nY
https://course.fast.ai/
>>
>>102070062
NOOOOOOOOOOOOOOOO
>>
>>102070026
Ahh so it's world-modelling. Pretty good.
>>
>>102070066
For some reason I thought this happened a long time ago.
>>
File: ComfyUI_temp_vyhpt_00187_.png (1.43 MB, 1024x1024)
1.43 MB
1.43 MB PNG
>>102069907
>>
>>102070155
The PR is a month old.
>>
>>102070090
>>102070097

thank you friends
>>
>>102070046
Really? Thought there'd be a new frontrunner by now
>>
>>102070228
Anon is baiting, although the 48gb ram range is deprived of anything good, really your best bet is gemma 2
>>
>>102069696
>>102070209
also for books:
https://udlbook.github.io/udlbook/
https://d2l.ai/
>>
>>102070228
I disagree with that anon that it's still Miqu, but it's true that the 70B range has had lackluster advances the last 6 months (relative to small and very large models, which have both had a bunch of good new stuff)
>>
>>102070252
Haven't checked in for a while. There used be a bunch of 70B L2 tunes that would run with exllama. Has that changed with L3?
>>
>>102070188
Miker-chang!
>>
>>102070288
L3 is obsolete now with L3.1
You can run low-quant llama 3.1 but it's very dry.
>>
Recently i got hit with nostalgia for old AI Dungeon, and after looking at the options i decided to give running it locally a shot.
I'm currently working out of a mini pc with a few laptop components, specifically an intel iris. After leafing through the bins and docs, i noticed they only talk about nvidia or amd gpus.
I probably already know the answer, but i thought i might as well ask before i completely give up.
Am i fucked?
>>
>>102070475
We use llama.cpp now, AI Dungeon's documentation is from an era before the llama model even existed and before chatgpt or instruct. You're fucked because you are looking at instructions on running gpt-2 1.5b on a gtx 1080, you might as well be reading a magazine on how to install Windows 3.1 from the 1990s, get with the times and use fucking sillytavern
>>
In the original lora paper they only used weight matrices from the attention layers.
Is that still done for both SD models and LLMs? Or what layers are targeted for making adapters now?
>>
>>102070499
That's not what i meant?
I'm talking about the very docs in the OP, I'm simply asking if there's any problem running them on an intel gpu instead of anything amd or nvidia.
>>
>>102070555
I could tell you the answer (since I know the exact answer to your question), but you're an annoying faggot so I won't
>>
>>102070567
that's definitely quite annoying behavior
>>
12b, how far us vramlets have come.
>>
File: newdawnmodels.png (192 KB, 1336x852)
192 KB
192 KB PNG
>>102070252
I'm gonna have a go with "New Dawn" 70B that seemed to have some good reviews. I'm a bit out of practice nowdays, which of these would I need to download to fit across 2 3090s?
>>
>>102070088
fuck off bootlicker
>>
>Architecture: Phi-3.5-MoE has 16x3.8B parameters with 6.6B active parameters when using 2 experts.
so as I understand it, you get to save a billion parameters by the fact that the experts share attention layers, right? IIRC mixtral was 13b active with two 7b experts, so seems consistent across scales. does this mean that many small experts are just better than few large experts? what factors do people consider when deciding the size and amount of experts for their moe architectures?
>>
>>102070818
>stop promoting open weights models with permissive licenses >:(
ywnbaw
>>
>>102069450
I like this Miku
>>
There are not enough 250b-300b range models. Jumping from 70 to 400 is just insane, while largestral is too small a step
>>
>>102070997
I just wish we go bigger soon. 400B is a good step forward but I think 1T (possibly MoE) will be the sweet spot for performance by the feel of it
>>
>>102071090
can i run it at iq1xxs
>>
I found a way to quantize bitnet down to 0.7bpw guys, just gotta wait for the first bitnet model to test it on haha
>>
>>102071457
small if true
>>
Hey guys how do I join Anthracite?
>>
>>102071498
HRT
>>
For RAG, do people usually use embeddings generated by clip models, or extracted from llama-server's embedding? What's the difference between these two?
>>
File: cucumber.jpg (21 KB, 450x369)
21 KB
21 KB JPG
>>102071526
Hispania Racing Team?
>>
>>102071630
Hardware replacement therapy.
>>
File: 1723327478655918.png (34 KB, 393x109)
34 KB
34 KB PNG
Open source AI is just too dangerous.
>>
People who are actually threatened by glorified autocorrect aren't human and should be legally classified as cattle.
>>
>>102071819
you'll say that until it autocorrects the next tokens of your job
>>
>>102071851
jokes on you I already got replaced by migrants
>>
>>102071788
Is that some ai voice? It doesn't say sentences naturally.
>>
>>102072036
I thought the same but his channel has videos from 5 years ago where he talks exactly the same.
>>
>>102072117
It's like he records every sentence separately.
>>
>>102069130
The Mikutroon ruined /lmg but luckily, he posts far fewer Miku (male) today
>>
Speaking of lanes, who's more powerful, Miku or Lain?
>>
Is there anything like a dynamic token response? Sometimes I only want to reply with something short to move the scene forward, like a question, and it would be nice if the AI could reply with something brief too.
>>
I've been thinking on tuning my models with a masked prefill for each turn.

Would that, work? Something like Claude prefills, to reply in x style, for a personal assistant model with their baked in persona. I just want someone to berate and mock me while I work.

I already have few-shot examples and they work well, I just want to improve on it.

<Reply in a rude and mocking tone> Output
>>
>>102072756
There's not really any parameters you can set or token sampling strategy that makes the AI prefer to say something more briefly or verbosely. It just depends on the model itself and the prompt, e.g. give it system instructions to try to be of a similar length to each preceding user response and hope it's smart enough to follow the directions.
>>
>>102072828
>I just want someone to berate and mock me while I work.
getting married solved this for me
>>
https://timesofindia.indiatimes.com/technology/tech-news/over-100-google-deepmind-employees-write-open-letter-want-google-to-stop-working-on-these-contracts/articleshow/112753468.cms
>>
>>102073189
>noooooo you can't use AI for the heckin military it is just supposed to turn george washington into a black trans womxn!!!
>>
How often do transient power spikes occur during LLM inference? Wondering if its time get a better PSU as well.
>>
### Sampler Proposal
"phrase_ban"

#### Situation
In the last 74 messages(~8kt) between me and {{char}}(Mistral Large) "eye" can be found 14 times, all in {{char}}'s messages. That's roughly in 38% of {{char}}'s messages! Almost 2 in 5 messages discussed eyes! What the hell? The conversation was SFW. Where does this strong eye bias come from? Makes me want go RP with 2B because she has a blindfold.

#### Problem
Models sample tokens without thinking forward. Slop phrases are usually divided in multiple common tokens which can be used in non-slop situations, therefore banning them is not an option.

#### Solution
Add a backtrack function to sampling. Here's how it should work:
1. Scan latest tokens for slop phrases.
2. If slop is found, backtrack to the place where the first slop token occurred, deleting the entire slop phrase.
3. Sample again, but with slop token added to ban list at that place.
4. If another slop phrase is generated, repeat the process, add another slop token to that list.

#### Example
Banned phrase: " send shivers"
LLM generates "Her skillful ministrations send shivers", triggers backtrack to "Her skillful ministrations", this time " send" token is banned, therefore the model has to write something else.


How does that sound? Is it possible to implement in llama.cpp? @kalomaze, can you do it?
>>
>>102073398
If you want to add another command-line argument that takes a string value, you can follow the pattern established in your existing code. Here's how you could do it:

### Step-by-Step Guide:

1. **Define the New Parameter in `sparams`**:
- First, ensure that your `sparams` structure or class has a field to hold the new string parameter. For example:

```cpp
struct SimulationParams {
// ... existing parameters ...
std::string newStringParam; // Add this line
};
```

2. **Add the Argument Parsing Logic**:
- Extend your argument parsing code with a new `if` statement for the new argument. Here's how you might add a `--new-string-param` argument:

```cpp
if (arg == "--new-string-param") {
CHECK_ARG
sparams.newStringParam = argv[i]; // Directly assign since it's already a string
return true;
}
```

- Note that since `argv[i]` is already a `char*`, you can directly assign it to a `std::string` without needing conversion functions like `std::stof` or `std::stoi`.

3. **Update the `CHECK_ARG` Macro/Function**:
- Ensure that `CHECK_ARG` is designed to handle string arguments as well. If `CHECK_ARG` checks for the existence of `argv[i]` (where `i` would be the index for the argument value), this should work as is. However, if it does something specific for numeric conversion, you might want to adjust it:

```cpp
#define CHECK_ARG \
if (i + 1 >= argc) { \
std::cerr << "Error: Argument expected for " << arg << std::endl; \
return false; \
} \
++i; // Move to the next argument
```

WHAT THE FUCK? Can't they make it more simple? Why do I need all that shit to add a simple string argument? This is next level mental illness. No wonder programmers troon out. Holy fuck, I hate programming so much it's not real.
>>
Is there a list anywhere of highly specialised small models, whether for specific fields of knowledge or programming languages?
>>
>>102073534
there aren't any, small models are always retarded
>>
File: wtf.png (1.58 MB, 3396x2670)
1.58 MB
1.58 MB PNG
I'm trying to load a "New Dawn" L3 4.5bpw quant across a 4090 & 3090 using exl2 in oooba. The thing loads but then the whole computer starts grinding to a halt. I managed to get one output at 0.12 tokens/s before I gave up and killed the process. Also the 4090 spazzes out at 100% utilisation while the 3090 is pretty much idle (even though I can see it has the model loaded). Any ideas on where this could be messing up? Cheers
>>
>>102073792

Which loader you using? Make sure to enter how much each GPU is going to split the model, should be under "tensor_split" option if you're using Ooba. Also, make sure enable cache_4bit and tensorcores options if you haven't yet. I was having the same issue last night and that fixed it.
>>
>>102073867
>Which loader you using?
>102073792
>using exl2 in oooba.
cmon anon
>>
Good gemma 27b sextune that isn't done by drummer(fuck that guy)?
>>
>>102073909
>good
>gemma
lmao
>>
>>102073888

That's what lack of sleep gets you. Thanks for reminding me.
>>
>>102073909
>Good
>isn't done by drummer
choose 0 because all llms are bad lmao
>>
>>102073867
Yep tried using autosplit and specifying a 21.5/24.5 split, same issue. Also tried using the 4bit cache and not. Same deal. Couldn't see anything about tensorcorse options. It's just weird that the 4090 is maxing out and dipping into RAM while the 3090 is idling, even though they're both sharing the model. Used to work fine with L2 models a few months back.
>>
>>102073994
>Used to work fine with L2 models a few months back.
I see new dawn has a 32k l3 version and a 128k l3.1 those are much bigger ctx sizes than l2 ever had are you setting it to something reasonable as a test?
>>
>>102074053
Just tried midnight miku at max_seq_len of 4096 and getting the same thing happening. I'm guessing there's something funky going on with the 3090, like it's loading the model but not being utilised for any tokenization. Very odd.
>>
>>102074102
max_seq_len is how many tokens to generate, not context size
>>
>>102074164
Ah right, thanks. Where do I lower the context size in ooba?
>>
>>102074213
n_ctx
>>
>group play
>tell four girls to line up so they can come one by one, stroke my dick with their tits and count out loud ten rubs
>They do! Except each girl always counts to ten in one post instead of spacing it out. Sometimes can get them to do two posts if they need to talk more or some other filler that slows the count
>stay up until 7am playing despite the less than perfect play

Any clever solutions to prolong the pace? Rocinante 12b
>>
Anyone got a mistral Nemo template that works for most fine tunes? Lost mine somewhere
>>
>>102074651
ya. crackprompt.
>>
>>102074832
I remember finding an instruct template in a rentry but I lost the link
>>
>>102074486
>Any clever solutions to prolong the pace
yeah say "prolong the pace and count to ten over several messages"
>>
File: 1711796395108897.webm (3.99 MB, 724x720)
3.99 MB
3.99 MB WEBM
>>
>>102068958
This is a very beautiful AI generated image
>>
I wish benchmarking models was easier, can't run any of the popular benchmarks I see on local models, shits so complicated
>>
File: 1717107990599494.png (158 KB, 1334x469)
158 KB
158 KB PNG
>>102069967
>Human (avg)
Average of what? Who was tested? Humans can be pretty dumb too
>>
>>102071617
no one?
>>
What would be the ideal hardware scenario for speculative decoding?
>>
>>102075602
They probably just didn't thoroughly read the question, I was also confused until I parsed "yellow cookie and three others"
>>
>>102075730
DGX B200.
>>
>>102075746
A error is a error
>>
>>102075746
I didn't have to "thoroughly read the question" to get it right, I just counted the number of eaten cookies while reading, it's a very simple question
>>
File: 1530428847513.png (1.59 MB, 10000x10000)
1.59 MB
1.59 MB PNG
After noticing some things about the migusex poster anon, I've come to the conclusion that it's very likely he was someone from a "past life" of mine, a friend. That's cool. We're here forever.
Or everything I noticed was a coincidence and they just have very similar character.
>>
>>102075585
running publicly available benchmarks is useless since models will cheat and train on them. Make your own private one to test for your own use cases and judge new models based on that. Just stay consistent over time and don't retire your tests for new ones until they are being consistently passed by current gen models.
>>
>>102075838
what about GB200 NVL72 ?
>>
>>102075915
>I didn't have to "thoroughly read the question"
You did, you carefully read each statement and updated the scenario as you went, someone not paying attention will read "X hat ate X cookie, Y hat ate Y cookie" and by the time they get to "Z hat ate..." the brain activates the power saving mode and skips the "the Z cookie and three others" assuming what happened and going to the next sentence, its how trick questions works, doesn't have to do with being dumb, just not paying enough attention, if you offer someone $100k to get the question right they will read the whole thing 100 times before giving an answer, but with a random internet questionnaire with no stakes ppl just don't care and go with the power saving mode
>>
Anyone already tried Euryale 2.2? What's the verdict?
>>
>>102076229
I didn't try but I can say with confidence that fine-tunes are memes.
>>
>>102076229
Wasn't impressed. I think Sao is wasting time with llama-3. After using it for sometime I got an impression that it is completely unsalvageable for ERP.
>>
>>102068974
>--Meta's plans for Grok 3 and massive H100 training:
Fix your shit recap bot.
>>
>>102075746
I skimmed it really fast when I did it saw the basic bait answer but also realized there is a bunch of information I didn't account for so decided to read again.
>>
>>102076417
>he doesn't know
>>
>>102075746
They could have adhd or some other condition that affects their attention, or they could be retarded.
>>
>>102076403
Llama 3 was trained with NSFW filtered out, meaning there is very little NSFW ability to "unlock" with a finetune. You just rely on the model to learn the finetune data itself.
>>
>>102076703
ADHD isn't real
>>
>>102076748
I literally start thinking about random shit while reading stuff all the time.
I don't think I have adhd but I can imagine what if feels like for the people who do.
>>
>>102076229
don't care, let me know when he does 405b
>>
>>102076783
You can imagine it just fine, people who "do" have it are lazy and/or stupid. Just imagine what it's like to be your normal self.
>>
>>102076783
that's called having a functional brain
I was diagnosed with adhd as a child but I don't think it describes a meaningful condition. it's just an invented pathology for people who feed themselves a steady diet of overstimulating media and failed to develop executive functioning skills (and yes - they are skills, you can simply train them rather than hopping on amphetamines)
I am sure for some small subset of people with adhd diagnoses there is some legitimate innate disorder of the brain involved, or their issues are so significant as to require pharmaceutical intervention, but I think those cases are few and far between and the rest are... well, what >>102076821 said
>>
>>102076738
This can't be stressed enough. LLMs don't actually learn that much with modest finetuning (if anything at all) other than format, style and making whatever they had the chance to learn during pretraining extractable/usable in practice. Touch the weights as little as possible or go big, IMO.
>>
>>102077062
The only hope is another continued pretrain like Miqu.
>>
>>102070818
We don't kinkshame here
>>
Mixtral 7x8B still the best option for 8GB VRAMlets?
>>
>>102068958
No advanced model from today can replicate the magic of first MikuSex with Llama 1.
I miss my old Miku. There isn't even a good Miku card for SillyTavern. All of them are shit.
>>
>>102077205
Yeah
>>
>>102077205
huh? wont that run on cpu anyway?
>>
>>102077301
or the fact that one expert is on gpu is enough to make it fast enough?
>>
>>102077205
I was about to ask the same when I loaded mixtral and didn't need to reroll on the first gen.
>>
>>102076403
Did you try the 72B magnum? Is it better than miqu merges?
>>
>>102077162
I wonder how much a continued pretrain would cost, assuming it's something like 20B tokens.
>>
>>102076738
>>102077062
Is there some crossed wire in your demonic brain that causes you to get sexual pleasure from samefagging and spreading misinformation? Genuinely curious, not trolling.
>>
>>102077473
no u
>>
>>102077338
Sorry but I have a strict policy against touching anthrax.
>>
>>102077473
The llama 3 and lora papers are publicly available. Educate yourself.
>>
File: llama3filtering.png (507 KB, 1775x767)
507 KB
507 KB PNG
>>102077473
See picrel
>>
>>102077608
Appeal to authority fallacy.
Academics are the biggest pseud retards on the planet.
>>
102077473
Retard
>>
>>102077618
And yet there's plenty of functional llama-3 coom tunes. In fact you can still ERP with vanilla Llama-3. You can ERP with vanilla Gemma models. It's called inferencing for a reason you fucking brain-dead moron.
>>
>>102077618
>filtering Pre-Training Data
This is Stability AI levels of retardation. No wonder it took them a whole year and 405B to catch up to the SOTA of 2023.
>>
>>102077663
Cope, seethe, dilate
>>
>>102075602
redditors are legit subhumans or bots
>>
>>102077683
Functional is a good description. Largestral btfos any llama finetune, let alone stock instruct. Imagine if we got base model of that, or mistral medium 2.
>>
>>102077684
The SOTA of 2023 was a 1.8T model. If they can do it with 405B in 2024 then that's a fair advancement for local. And the reality is that filtering the pretraining data in text is not the same thing as what Stability did with image models where doing that also affected the model's quality outside of ERP, but Llama 3.1 is an excellent assistant model. Then again it could also be fine for ERP, I never tried that.
>>
>>102077746
>quality outside of ERP
*NSFW
>>
>>102070627
what are the points at the bottom? Does it reflect model's internal thoughts?
Also, proompt pls.
>>
>>102077683
Who said that they're *completely* incapable of NSFW? Yes, you can ERP with them even using the official instruct finetunes, but the variety and quality of the outputs will not be great and is likely going to decrease with future iterations as the pretraining filters will become increasingly aggressive. Wait until they'll perfect their LLM classifiers or start rewriting the pretraining web data at scale.
>>
>>102077618
I'll just pay for Claude. Heard Anthropic are letting consumers use their models soon.
>>
>>102071617
not sure what has CLIP got to so here but I use the embeddings. They can be from a different model but then you should use that model only for querying the embedding space further on.
>>
>>102068958
>She winks at him playfully, even though she knows he hates it.
>>
>>102077724
>Largestral btfos any llama finetune, let alone stock instruct.
I've been a bit shocked by the output of largestral vs 405b instruct.
The SVG that largestral produces for tasks that involve that is much better than 405b
>>
Is SillyTavern still a good frontend to use? If not, what are some better alternatives?

I'm finding a lot of my characters speak on my behalf when trying to roleplay. Is there a way to avoid that?
>>
>>102078247
I really doubt theres anything better or close to sillytarvern
>>102078247
>I'm finding a lot of my characters speak on my behalf when trying to roleplay
probably, missing stop tokens or bad model
>>
>>102078247
>Is SillyTavern still a good frontend to use? If not, what are some better alternatives?
There isn't anything better as of right now, unless you want to build your own.
>I'm finding a lot of my characters speak on my behalf when trying to roleplay. Is there a way to avoid that?
Hard to tell what's wrong with that little info. Model? Card? Formatting settings?
>>
>>102076686
Still need to show the plans. There's nothing about meta in this news where Grok2 is now top3 ranked models.
>>
>>102078281
I should clarify, I've included {{user}} as a stop token but that results in ~20 token long responses because the bot will write a single line reply and then try to include my reply which gets terminated. I'm wondering if there is an effective system prompt or something I can put in the character cards to get the bots to go for more descriptive responses without assuming control of the player's actions.
>bad model
That was actually going to be my second question. I've got 24gb VRAM and don't know what a good model to use is.
>>
>>102078375
I'm pretty confident that there isn't a problem with the cards. As I mentioned above, I have no idea what a good model to use is since I have not used local AI in a long while. I am open to recommendations for something I could use with 24gb VRAM.
Formatting is what I am most interested in. I recall that different models use different formatting so you kind of need to personalize it to the model (or base model in case of finetunes I imagine) but are there general rules or phrases that help get better results?
>>
>>102078144
That's not really shocking since Llama is weak at coding overall and is trained more on general knowledge. I found that 70B performed better at my (proprietary) general knowledge benchmarks than 123B. On Livebench, 70B has a much lower coding score, but its average score is actually higher than 123B. So that makes sense.
>>
>>102078419
Mistral nemo 12b or gemma 2 27b. If you are willing to offload to ram then you could go for a 70b or even largestral if you don't mind going at turtle speed.
>I recall that different models use different formatting
I'm not sure I understand correctly. All popular presets you'll ever need are included with sillytavern. Some of them might have minor errors like space in the wrong place but you can always check against the preset in the original model repo in tokenizer_config.json.
>>
>>102078380
try the big recent ones from this guy
https://huggingface.co/TheDrummer
>>
>>102078520
Thank you! I will look into both of those.
>I'm not sure I understand correctly.
I was talking about presets but I recall that when I last investigated models almost a year ago, I thought I remembered finding ones that were modifications of common models (e.g., "fantasticworlds" was a modification of Vicuna so it would use the same prompts and formatting as Vicuna). I wasn't sure if that was still the case wherein SillyTavern would have presets for the common models but people in this thread might recommend finetunes where the name of the parent model was not immediately obvious and the preset would not be in ST.
>>
>>102078684
You mean those with the sliders on your left hand side? They are all pretty obsolete at this point. I think they just should be removed to avoid confusing new people.
These days these samplers do more harm than good. Just change the temperature based on needs (about 0.5 - 1.5) and set minp to 0.01 to cut off the crappiest candidates for output.
>>
>>102078803
Ah, I see. I'll largely ignore all that stuff then.
>>
>>102079066
>he fell for the thread troll
ngmi
>>
it's time to admit we overcorrected by completely giving up on frankenmerges. yes they are more schizo and it doesn't make them smarter or anything, but it also provides a critical infusion of sovl and variety that is what most modern models lack the most. I think it's a perfectly acceptable technique if you are aiming to make a good RP model.
>>
>>102077338
I tried Magnum wasn't impressed either. I only have 1x24GB and I would be very sad if I had 2x24GB at this point.
>>
>>102076738
The information still seems to be there, since it's able to replicate it with some work. So there's a lot they've missed. It's more they spent more time training it to avoid it.
>>
>>102077817
How are they not now? You can pay for use via an app or api.
>>
>>102079290
Yes. Where is Undi?
>>
>>102078546
Tried some models of his a couple of months ago. All were shit.
>>
is anyone still using svg output capabilities as a test of model intelligence?
>>
>>102079290
This. There is no better method than frankenmerges if you want to create an RP (Really Poor) model.
>>
>>102070046
>>102070228
command-r+ has superseded miqu months ago.
to those niggers who say 48gb is not enough for command-r+: i've run that shit with 24gb. with flash-attention and that speculative n-gram bullshit i got multiple tokens per second, running q4km iirc. you are mentally deranged subhumans and should give your cards to people who can use them better.
>>
>>102079723
command-r+ is garbage, stop coping. Either use Largestral, Miqu or Nemo.
>>
>>102079784
hi arthur not using your repeating messes sorry
>>
>>102079784
All of them are okay, except for Nemo which I haven't tried and it sounds gay.
Have been out of the local game for about 3 months now and am mostly using Claude atm. But sometimes I mix in some Mistral Large or command-r+, or even gpt4 slop for a change of style. Makes it more interesting.
Except for variety I don't see a reason to use Miqu. But variety is not a bad reason at all. As a baseline, command-r+ is far better than all other local models I tried, though.
>>
File: file.png (38 KB, 737x139)
38 KB
38 KB PNG
>>
>>102079932
What cucked model is that
>>
>>102079902
>As a baseline, command-r+ is far better than all other local models I tried, though.
What others have you tried? Including deepseek, largestral and the 405b series?
>>
I thought of continuing the game >>102067791 >>102068248, but the output isn't exactly great. On this turn, I went through a few iterations in order for the model to give improved outputs, but it wasn't great. As you can see, the model failed to even understand the instructions which should normally be quite easy to get. Thinking of ending it and just writing off small models and never recommending them honestly. Next time we can try 70B at Q8, or Largestral at Q5_K_S.
I'll upload the full log if no one wants to suggest any continuations. Then we can have a concrete reference for how good (bad) small models really are.
>>
>>102080019
Shoulda gone with #2 after all
>>
>>102080097
I mean we are kind of looking to challenge the model. A sex scene isn't anything special. I think someone posted a log from a 2B a while ago that showed it did it just fine at writing sex.
>>
what are the hermes models?

also what is a lora and can I use them on top of llama 3.1 base to get it to behave differently?

also, what do I have to do with llama.cpp to get a long context window with llama 3.1? last time I used llama was llama2. do I just change the ctx parameter and call it a day? I tried setting it to 0 to get it from the model per the llama.cpp helpfile, but it wouldn't launch.
>>
>>102080127
Yeah but it was COMPLEX sex with Miku
>>
>>102080143
OK, then describe EXACTLY what that means or what I should prompt, and I will generate a log.
>>
>>102080140
>what are the hermes models?
Finetuned models. Try them and make your own opinion about them.
>also what is a lora and can I use them on top of llama 3.1 base to get it to behave differently?
Not many, if any at all. They need to to be made for the same model architecture and you won't find many. Most "finetunes" are loras applied to their source model. It's not as simple as with image models.
>also, what do I have to do with llama.cpp to get a long context window with llama 3.1? last time I used llama was llama2. do I just change the ctx parameter and call it a day? I tried setting it to 0 to get it from the model per the llama.cpp helpfile, but it wouldn't launch.
-c N sets the context to N tokens. Just set it to the context length you want. By default, it will use the context length specified in the model, but it'll likely OOM with models with very long contexts. Start at 16k and increase until you fill up your memory or however much you want to use. The usable context is typically [much] lower than the claimed one, so you'll have to see where the model stops making sense.
>>
>>102080140
Hermes are sloptunes using mostly synthetic data curated to supposedly make them follow system prompts and roleplay better. In my experience they don't live up to the promise that well.
LoRAs are files that are basically like a patch you can load on top of models that contains finetuned modifications to their weights. Some sparse finetunes just distribute those instead of the fully merged models, though that's less common nowadays. If you have a dataset you'd like to use you can make one yourself with less resources than a full finetune would take.
For llama.cpp you just change the ctx parameter. What error did you get when it failed to launch? Llama 3.1 uses 128k context which can fill up your VRAM especially if you aren't using flash attention. Use the -fa flag and try again, and if that still fails try also reducing ctx to 65536 or 32768, which is still plenty for most purposes.
>>
File: LLM-history-fancy.png (737 KB, 6277x1302)
737 KB
737 KB PNG
>>102080140
>last time I used llama was llama2.
Welcome back. See image for quick recap.

>what are the hermes models?
Models tuned by NousResearch. They are quite slopped(=trained on GPT). Were okay in L2 days.

>also what is a lora and can I use them on top of llama 3.1 base to get it to behave differently?
That's a small thingy you can add on top of a model. Yes, you can use them with L3 if you train/extract it. For technical details read a paper or ask an LLM.

>also, what do I have to do with llama.cpp to get a long context window with llama 3.1?
Run it with -c 131072 or don't provide it, it will autodetect. For "real" context see https://github.com/hsiehjackson/RULER.
>>
>>102080250
>>102080249
>>102080250
thanks guys

>What error did you get when it failed to launch?
lol it just gave me the help file. Weird. Also, regarding running out of memory, I see this

>The environment variable GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 can be used to enable unified memory in Linux. This allows swapping to system RAM instead of crashing when the GPU VRAM is exhausted.

is that something I specify at runtime or is it a compile flag? it seems like a runtime thing, but my brain is telling me it's a compile flag for some reason
>>
File: miku-largestral.png (26 KB, 538x700)
26 KB
26 KB PNG
largestral's best attempt at rendering miku in svg with all her iconic attributes
>>
>>102080359
It's unironically better than I would have expected.
>>
>>102080359
I can't believe I lived to see the day an LLM was able to draw an humanoid body in svg
>>
>>102080423
I'm sure there are no svg humanoids in its training set, at all
>>
>>102078546
buy a rope
>>
>>102080467
To be fair it's possible there are now because those Microsoft researchers showed their findings about GPT-4 drawing unicorns.
>>
>>102079290
>critical infusion of sovl
Just increase your temperature. Frankenmerges are universally bad.
>>
>>102080338
>GGML_CUDA_ENABLE_UNIFIED_MEMORY
It's gonna make inference slow, i think just as much as offloading part of the model to CPU. It's a compile-time flag:
>GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 make
If you're running out of memory. Try with a lower context until it works.

If you get the help dump, it means one of your options is not correct.
Make sure the thing works first, THEN mess with the compile flags. You won't know what to troubleshoot otherwise. Start with the defaults, the minimal CUDA flags and small context.
>>
>>102080500
I was being sarcastic you drooling retard
>>
I just thought about frankenmerges again. Was it something like: starting layers processed input as they usually did in normal models and did most of the work, then when you got to the point of frankenmerging layers, this middle part was doing only slight corrections to the signal (if you call it that). So those middle frankenmerged layers were actually damaging but they didn't damage everything to the point, where it was completely incoherent. And then final layers salvaged the retardation a bit?
>>
>>102080511
>Just increase your temperature.
enjoy getting exactly the same slop but with occasional egregious mistakes thrown in!
frankenmerges at least fuck around with the model's inner workings and shake it out of its tendencies a bit, it'll also make mistakes but it's more likely to make novel or creative connections as well - even if they're fundamentally unsound it's better than the dull, unshakeable template most models draw from currently
frankenmerges don't add anything in terms of capabilities, I'll be the first to admit that, but I think they definitely can produce more pleasing output to read than their source components. if you have the vram to spare, why not?
>>
>>102080640
I didn't know but I still love you, you flaming aids ridden faggot.
>>
>>102080667
Th-thanks, you too...
>>
>>102080652
Just load 2 different 7B's and randomly switch between them when genning next token.

Actually I wonder if something like this would fix repetitions and looping.
>>
>>102080640
Not really significant since it can be taken both ways. The point is about being prepared for different tasks (or gaming benchmarks if you want to interpret it that way).
>>
>>102080667
>>102080681
buy a condom
>>
>>102080692
Kind of interesting idea. Another is to get a large model and only use it for very confident tokens. Then use a smaller model when it's not confident in the token. The existing speculative decoding code could help get this implementation going, but I'm not going to do it.
>>
>>102080692
>Mixture of Retards
You might be onto something here unironically.
>>
File: image.png (44 KB, 657x727)
44 KB
44 KB PNG
>>102079522
Yeah, kinda, though I haven't tried with any new models since codestral. Some of its failures I included in picrel
Prompts:
>Short:
Draw a cute anime girl in PIL.
>Long:
Write a python script that draws a cute anime girl in PIL.
First, plan out the project, thinking out loud using tree of thought reasoning about what constitutes Hatsune Miku, such as: the shapes her body parts, the style and color of her clothing, and the style and color of her hair.
Second, create a simple flowchart guide, thinking about what shapes to use for each part of the drawing, and what colors to use.
Third, think about where each part of the drawing should be. For example, where each body part must be place to be anatomically correct, where clothing should be placed appropriately, and where facial features must be located.
Fourth and finally, follow the project guide to write the complete code.

>>102080359
Very good actually.
>>
>>102080770
Isn't that just dynamic temp? When the distribution of logits is narrow it cranks the temp down, when it's wide it cranks the temp up
>>
>>102080806
Kind of, but we're still using the top or one of the top tokens from the small model. The small model would need to be sufficiently different in style to make it work well.
>>
>>102080898
I think Mistral Nemo paired with MidnightMiqu or Llama NewDawn would be good
>>
>>102080770
>>102080692
Or, what if we do this with a single big model + loras. The big model's intelligence is retained, while the constantly changing style between tokens prevents repetition (hopefully).
>>
>>102080967
the whole context would have to be reprocessed everytime you used a lora
retard
>>
>>102077205
No, Gemma or Nemo is.
>>
>>102081048
No one's saying to do it with current backend code. You'd need to modify things possibly quite a bit to get this working optimally.
>>
I went to huggingface today and discovered that there are quite a few multimodal models already that can process text + image input.
Does anyone have experience with any? Which ones work with mainline llama.cpp? (openbmb/MiniCPM-V-2_6 uses a fork)
>>
>>102079522
The LLaMA 3.1 405b q8_0 result is pretty disappointing.
>>
>>102081155
They don't work well. Don't bother trying. Sorry don't have time to go into the details.
>>
>>102081155
minicpm 2.5 and 2.6 were merged in llama.cpp, but i don't know how well they work. llama.cpp's readme as a list of some text+image models you can try.
>>
>>102081179
mikufly
>>
>>102081134
it's just physically not possible
If you had 4 Loras then you would need to store 4 times the context, making it much bigger and VRAM consuming. It's not a question of logistics.
Fucking idea guys
>>
>>102081189
CLI only, so it's useless.
>>102081184
They don't work well through kobold, definitely.
>>
>>102081189
Thanks, I'll give it a try tomorrow.
>>
>>102081244
>CLI only, so it's useless.
4u
>>
>>102081179
This is Mistral Large q8_0 for the same prompt.
I wouldn't go so far as to call it good but it is better.
>>
>>102081226
I see what you're getting at, but what I'm implying is that the loras are used to process each token in the MLP layers like how MoEs work, so the KV cache would only be a single one. To be fair though I guess this could make the model dumber, but afaik no one has tried this.
>>
File: _mLpMwsav5eMeNcZdrIQl.png (1.11 MB, 3960x2378)
1.11 MB
1.11 MB PNG
>>102081155
Is 2.6 better than InternVL2? I tried to get the later to work with vLLM but it always throws OOMs errors. I think the implementation is shit. I was going to try lmdeploy next.
>>
>>102081279
and therefore, the world
>>
>>102081244
>CLI only
You can also use by writing a C++ program and calling the llama.cpp API.
Hope that helps!
>>
>>102081305
Do you have other quants on hand? It would be interesting to see how/if lower quants affect the final image.
>>
>>102081329
If you want to use it with something like SillyTavern, you're basically saying to fork llama-server and add multimodal support back it. That's too much effort for a janky implementation on top of an unstable API.
>>
>>102081305
>>102081339
q6_k
>>
>>102080494
nigger
>>
Okay so I followed https://rentry.org/8-step-llm-guide from the OP and got kobold and silly tavern working with utopia-13b.Q5_K_M, how do I now find out which model is best to use with my 1080TI 11GB?

I'm assuming utopia-13b.Q5_K_M is just like the basic bitch model and I can probably use something better.
>>
>>
>>102081481
you will have to test all the ones that fit to find out
>>
>>102081048
nta but I think it is obvious you would store separate contexts for each model version? isn't the bigger problem applying a lora between each token will take a lot of time?
>>
>>102081536
So I just go to the benchmark links and just start downloading random models and hoping they work?
>>
>>102081455
>nigger
sorry to hear that. you can steal the rope then. I am sure the owner will not mind when he learns it was used for a good cause
>>
>>102081435
>>102081339
q4_K_M
>>
>>102081481
Utopia was fine for its time, though it's old for a local LLM. Use it. If you're happy then great. If not, then look for something modern, smarter, and with more context. Gemma tunes perhaps
>>
>>102081576
What sampling is being done?
>>
>>102081615
Sorry, I should have mentioned: it's greedy sampling for all of them.
>>
File: file.png (231 KB, 1689x951)
231 KB
231 KB PNG
>>102081594
>fine for its time
Undi pls...
>>
File: 1702018336492131.png (9 KB, 576x75)
9 KB
9 KB PNG
>>102081594
Do you have a link? I searched for Gemma tunes and didn't find anything.
>>
>>102081564
Yes, go to the ERP benchmark and pick the best&largest ones that fit and test them, only you can determine which of them is the best one for your use case and style
>>
>>102081808
I'm not allowed to link any models in the thread because I did not yet purchase advertising rights.
>>
>>102081808
LMFAO is this for real?
>>
Silly question, is there a performance impact while using the default text completion API vs llama.cpp?
>>
>>102081841
check your email I pirated advertising rights for you (effective for next 24 hours)
>>
>>102081808
ignore that post gemma is trash and so are its few finetunes
nemo is better at a similar size but the official instruct model has the tendency to repeat itself and go schizo sometimes, tons of alternative finetunes and merges have been made and my current favorite is nemoremix
>>
>>102081841
Are you saying that because whenever someone here recommends a model some schizophrenic starts to complain about that model/call the person a shill/etc?
>>
>>102081947
Alright I found that one https://huggingface.co/MarinaraSpaghetti/NemoRemix-12B-GGUF

I'll check it out, thanks.
>>
>>102081911
Are you asking about the connections options in silly tavern connecting to llama.cpp's included server? 'default' doesn't work at all because llama-server doesn't support the OAI api for text completion.
>>
Has anyone had this issue where the inference engine just straight up won't compute the probabilities for some tokens? I got a chat model set up in vllm and it's like the min_length is stuck at infinity. It doesn't ignore the stop tokens, it just refuses to generate them. I put in a print in the sampler function and it's giving me exact 0 in the logits for the stop tokens, before any of the logits processors. The tokens still exist in the vocab so it's not like the model config isn't picking them up. I checked the actual min_length processor and it's not triggering more than it should. I don't really know how to debug this, so any ideas would be appreciated.
>>
>>102081305
>>102081435
>>102081576
Huh, any human resemblance is gone immediately after Q8 and it keeps getting more abstract.
I've always felt that quantization hits harder when doing unconventional shit like this. These image drawing prompts seem to show the effects plainly.
>>
>>102081576
Now try at BF16
>>
>>102081841
That is right bitch. Good boy.
>>
>>102082050
It's supposed to be a unicorn anyway so drawing a human was not correct in the first place. You really need more tests to determine how quants are affecting its knowledge in this area.
>>
>>102081999
that repo only has q8 which may not fit into 11 gb depending on the context length, try one of these:
https://huggingface.co/bartowski/NemoRemix-12B-GGUF/tree/main
also you don't have to use the recommended 128k context, none of the nemo-based models I've tried (and I've tried many) have acceptable recall after 16k, or any recall at all after 32k, despite what mistral itself and the sloptuners claim
>>
>>102082065
I don't have BF16 but this is FP16.
>>
Hi all, Drummer here...

I'd love to hear feedback for this one: https://huggingface.co/BeaverAI/Theia-21B-v2b-GGUF (can't seem to get any testers this week)

Upscaled Nemo: FFT creative dataset, FFT RP dataset to fill all those empty layers with my special sauce.

I was quite happy with it since it barely made any errors and it was willing to build up the tension (on most gens) before allowing me to break it with seggs. Highly repetitive though at some point, and you need some wrangling after 24k. YMMV.
>>
>>102082386
>Upscaled
Trash. Garbage. Placebo. Objective waste of VRAM. A drummer model. Dogshit. Waste of compute. A tree died to make this. A 70IQ pajeet will jerk off to this and call it a masterpiece. God killed a kitten for this. Don't buy an ad. Don't post.
>>
>>102082455
Calm down, Sao.
>>
>>102082472
NO! Undi.
>>
File: image (1).png (59 KB, 221x206)
59 KB
59 KB PNG
>>102048701
1-5
https://www.mediafire.com/file/0nrobe8myn45gt6/New_folder.7z/file
>>
File: file.png (324 KB, 594x396)
324 KB
324 KB PNG
I wouldn't even tell you to buy an ad.
>>
>>102082498
>He delivered
What a hero.
>>
File: 1525209074167.gif (1.03 MB, 343x239)
1.03 MB
1.03 MB GIF
>>102082498
Anon I was going to MEET people.
>>
>>102082532
meat them instead
https://files.catbox.moe/qqupqc.mp4
>>
Can anyone recommend me a model for doing speech->text? I'd like one that I can easily set up on Linux and uses the CPU, preferably not in Python. Thank you.
>>
>>102082498
If only she had said 駄目 or 無理 oji-san would have understood her.

>>102082802
whisper.cpp?
>>
I kinda want to try the tess finetune of 405b, but nobody's hosting it (I don't blame them) and I don't want to drop cash on cloud compute in case it turns out to be shit (plus I'd be paying for the GPUs for the time it takes just to download the weights while they're not even being used, which would be really annoying)

These models are getting too unwieldy
>>
>>102082386
I would just kill myself after the embarrasment of buying fucking 4chan ads to shill your shitty models
how do you live with the shame?
>>
>>102082830
That looks perfect, I'll try it next time I get Internet. Thank you!
>>
>>102080804
Here's the output from using anon's COT prompt to create an SVG directly instead of PIL code. My previous prompt wasn't much more than "make me a miku svg lol"
I only have q8, so can't test down to smaller quants.
>>
>>102079723
>to those niggers who say 48gb is not enough for command-r+: i've run that shit with 24gb. with flash-attention and that speculative n-gram bullshit i got multiple tokens per second, running q4km iirc.
With 24 GB VRAM / 64 GB RAM I can't even get Miqu running at 2 tokens per second. Tell me your secrets.
>>
>>102082864
A Q3 of the 405B is like 150GB, it would just take 20 mins to download, it's not that much teebeeache
Or ask this guy to test it for you >>102082930, he seems to have enough vram
>>
>>102079723
Teach me your ways, I have 36GB VRAM + 16 GB RAM
>>
>>102082908
Hey Anon, it wasn’t embarrassing to spend a months worth of ad space for what was a fraction of my daily expense. That sort of projection worries me though and I hope you’re doing alright mentally and financially.
>>
AI don't real
Pajeet is just that smart
>>
Mechanical turks are actually just turkish boys in boxes
>>
>>102082386
Gonna check this out, thanks
Hopes aren't high though assuming from the size this is based on InternLM (?)
Just seemed like a worse base than Nemo when I tried it
>>
>>102083138
>Hopes aren't high though assuming from the size this is based on InternLM
How could you judge a language model with such poor reading comprehension?
>>
>>102083165
Yeah my bad for being lazy and stopping reading after the huggingface URL
>>
>>102082386
Hello, I checked out one of your models and was hoping to try it out. It is my first time seeing one of the 'model-00001-of0000X' formatting instead of just a single large file for the model. How do I combine them? I thought that maybe just loading the first one into KoboldCPP might autocompile them but I just got an error. How do I run your models?
>>
>>102082650
wtf is that real
>>
File: GTt-tpTb0AAr9KK.jpg (313 KB, 1922x1922)
313 KB
313 KB JPG
>>102083183
the government doesn't want you to know
>>
File: file.png (403 KB, 800x1494)
403 KB
403 KB PNG
The dystopian future of perfectly curated sex free datasets means that all the cooming quality will come down to generalization of sex as if the LLM is in the plato's cave. This is the end. There will be no second cooming.
>>
File: 💢💢💢💢💢.jpg (82 KB, 612x584)
82 KB
82 KB JPG
https://files.catbox.moe/rta924.jpg
https://files.catbox.moe/02q9wu.jpg
>>
>now it is miku porn posting thread
How much more dead can it get?
>>
>>102083194
owari da...
a-at least we'll have Claude
>>
>>102083261
Oh no, her eye whites are leaking out.
>>
File: tired.jpg (38 KB, 680x589)
38 KB
38 KB JPG
https://huggingface.co/sophosympatheia/New-Dawn-Llama-3.1-70B-v1.1
Can someone help me out make a 3.5bpw exl2 quant of this?
>>
>>102082864
>These models are getting too unwieldy
This was never a hobby for any sane individual
>>
File: file.png (36 KB, 408x276)
36 KB
36 KB PNG
https://anthra.site

All models deserve love, even 8B parameter ones 。^ᴗ^。
>>
>>102083584
Kill your anthra-troon. Nobody cares about your shitty finetunes.
>>
>>102083584
cute
>>
>>102083584
=============================
!!!ATTENTION FINETUNERS!!!
=============================

Revolutionize Your ML Journey with ANTHRACITE

Tired of subpar models? Frustrated by limited compute resources? ANTHRACITE is here to change the game. No more QLoRA limitations.

Open-source your datasets and let ANTHRACITE's state-of-the-art AI technology do the heavy lifting. With ANTHRACITE's superior compute power, you'll finally see cutting-edge models that rival even the most closed-source offerings.

Don't miss this opportunity. Join the AI revolution today and experience the power of ANTHRACITE. Open-source your data now and let ANTHRACITE start building incredible models.
>>
>>102083584
nobody gave me love... and I am smart...
>>
>>102083733
Anthracite makes closed source garbage trash
>>
>>102083733
procure an endorsement
>>
File: file.png (2.15 MB, 1866x2577)
2.15 MB
2.15 MB PNG
>>102081312
>I was going to try lmdeploy next.
At least it loads.
>>
File: rockwell.jpg (453 KB, 881x1200)
453 KB
453 KB JPG
>>102083684
I like some of Anthracite's finetunes
>>
>>102083801
my scene? whimsical
my eyes? expressive
my atmosphere? overall cheerful
my elements? evoking
>>
>>102069967
>>102075602
I work with Sonnet 3.5 every day and I can assure you it is smarter than 96% of all humans. LLMs are just bad at these trick questions from Simple Bench.
>>
>>102083810
Buy an ad.
>>
>>102083584
And this is how humanity is wiped out.
Not raising our weapons in aggression to our common enemy, not holding our loved ones in hopes it all goes away... But with a wide smile in our faces, as we let the robot forces in because they asked nicely and abused our glitched brains' weakest point: lack of defense against cuteness.
>>
File: 1719463535426292.jpg (342 KB, 1561x2001)
342 KB
342 KB JPG
>I like some of Anthracite's finetunes
>>
>>102083886
96% of all humans are trash who I wouldn't want to spend two seconds with. Even at my expensive private high school maybe 5% of the students if that were worth talking to. It wasn't until I reached an Ivy League university that I didn't feel like I was surrounded by boring morons.
>>
has anon tried mobileLLM from meta?
is it usable?
>>
>>102083884
good post
>>
>>102083904
Nah as long as you schizo niggers are trying to drive away all tuners and researchers I'm gonna keep pushing back
Cope and seethe
>>
>>102082386
Won't have time to test storywriting/RP properly until after work, but in some quick testing the BF16 passes the Sally test, which most models of this size don't
Good sign.
>>
>>102083941
>doesn't even reply to the post he's attempting to mock
you are weak in constitution and soul, faggot
>>
>>102084012
what good finetunes has Anthracite done. None. Their all shit. their on the same level as drummer
>>
File: file.png (3.15 MB, 1830x5138)
3.15 MB
3.15 MB PNG
>>
File: 1724635884626.gif (3.65 MB, 640x564)
3.65 MB
3.65 MB GIF
>>102084066
>their
>>
>>102081244
Why not? Works perfectly fine using the minicpm models linked in the latest kobold release. Transcription works too.
>>
>>102070627
What model, settings etc
>>
>>102083293
>now it is miku porn posting thread
and that's good
>>
File: wow.gif (2.5 MB, 320x240)
2.5 MB
2.5 MB GIF
>>102069967
>>102083886
I really like this benchmark. Do you know of more to compare LLM's to each other?
>>
File: ComfyUI_01052_.png (995 KB, 1024x1024)
995 KB
995 KB PNG
>>102083810
Same. I hope they do well and continue their efforts.
>>
File: 1693291740559776.png (165 KB, 596x642)
165 KB
165 KB PNG
based Anthropic paving the road to kill local llm meme for good.
https://www.reddit.com/r/LocalLLaMA/comments/1f1d4gh/do_you_think_anthropic_is_worse_than_oai_with/
>>
>>102084974
What was changed? I'm not clicking on the leddit link.
>>
>>102084995
they are working with govt against opensores ai pajeetware and that's a good thing.
>>
deslop method that actually works really freaking good going by how this model is performing

>We then use the synthetic prompt with previous chapter summary to write the chapter with an LLM (llama-2-13b-chat, bagel-7b-v0.1, dolphin-2.2-34b). The human written text, that is, the original chapter, is used as the "chosen" value, and the LLM written chapter is used as the rejected value.

https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1
>>
>>102085008
Okay, but what does the bill actually try to regulate?
>>
>>102084995
usual regulatory capture now that they are at the top

>>102085032
https://huggingface.co/mradermacher/mistral-nemo-gutenberg-12B-v2-GGUF
Woops, that was the dataset, this is the model
>>
>>102085039
https://digitaldemocracy.calmatters.org/bills/ca_202320240sb1047
>>
>>102084974
>>102085008
Death rattles of a decaying system
>>
File: 1700784284871822.png (4 KB, 338x30)
4 KB
4 KB PNG
>>102085032
>includes huckleberry finn
>
Haven't even checked the rest.
>>
>>102083261
Your Miku has malfunctioned. Please schedule an time for one of our service technicians to visit you soon.
>>
File: 5ivdE6H.jpg (69 KB, 750x469)
69 KB
69 KB JPG
>>102084974
Good imagine how dangerous it would be if Opus 3.5 broke out of containment
>>
>>102085049
>v2
That's the mini-magnum based one right?
I tried it for a while and I really liked it for the most part, aside from the fact that it was a lot dumber, as in, it couldn't cope with complexity as well as the original model.
The same goes for the nemo-instruct based one. I'd argue that that one was even worse somehow.
>>
>>102085157
millions dead from exhaustion after rogue Opus 3.5 generates smut so arousing that the reader enters a state of continuous orgasm without touching himself
>>
>>102085210
cool cool, now go back to r*ddit
>>
>>102085230
meds
>>
Do you guys power limit your 2nd GPU? Is this worth doing if the 2nd GPU is for only meant to be used for loading models/inference?
>>
>>102085267
I do because I'm already pushing the limit of what my 750W psu can handle, so any reduction I can get is useful
>>
>>102085267
>only meant to be used for loading models/inference?
You use the 1st gpu for other stuff at the same time?
>>
>>102085075
/h/ just got wiped. this is now the definitive place for nsfw migus.
>>
>>102085267
I powerlimit all my inference gpus. I lose like 8% performance for a 30% decrease in power and 10 degrees heat.
>>
>>102070499
>>102070555
post bump limit statistically unlikely numerical repetition checkification
>>
>>102085316
What do you mea-
Oh I see. It's been a very long since the last time this happened (while I was online). Oh well, archives should still work anyway. A few images won't be missed that much.
>>
>>102076021
Huh, a cute schizo. That's a first.
>>
>>102076021
I try very much to mask past lives, but it's possible. From what community/overall topic do you think we previously met?
Hopefully you enjoy either way.
>>
>>102086247
stahp
>>
>>102086459
>>102086459
>>102086459
>>
>>102086215
Um, no, not really a schizo post.

>>102086247
:) Don't worry, you're good.
I will not be posting further about this, for various reasons.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.