[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101134566 & >>101125756

►News
>(06/25) Cambrian-1: Collection of vision-centric multimodal LLMs: https://cambrian-mllm.github.io
>(06/23) Support for BitnetForCausalLM merged: https://github.com/ggerganov/llama.cpp/pull/7931
>(06/18) Meta Research releases multimodal 34B, audio, and multi-token prediction models: https://ai.meta.com/blog/meta-fair-research-new-releases
>(06/17) DeepSeekCoder-V2 released with 236B & 16B MoEs: https://github.com/deepseek-ai/DeepSeek-Coder-V2
>(06/14) Nemotron-4-340B: Dense model designed for synthetic data generation: https://hf.co/nvidia/Nemotron-4-340B-Instruct

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: tet.jpg (44 KB, 637x358)
44 KB
44 KB JPG
►Recent Highlights from the Previous Thread: >>101134566

--Paper: Adam-mini: Use Fewer Learning Rates To Gain More: >>101141337 >>101141838
--Papers: >>101140697 >>101140766 >>101140609 >>101140655 >>101140878 >>101141988 >>101140729
--Template for AI-Powered Person with Human-Like Interactions: >>101141420 >>101141500
--Voice Synth After Elevenlabs' Changes: New Projects and Challenges: >>101144328 >>101144357 >>101144434 >>101144650 >>101144660 >>101144808
--Vectordb-ing Wikipedia for Efficient Querying and Embedding Archives: >>101141307 >>101141318 >>101141329 >>101141327 >>101143811
--Probllama: Ollama Remote Code Execution Vulnerability (CVE-2024-376032) – Overview and Mitigations: >>101134926 >>101135029
--Overclocking A6000 Memory for Performance Boost: >>101138329 >>101138358 >>101138445 >>101138570 >>101138599 >>101138640
--Hypothesis for Improving Character/System Prompt Following with Stat Tracking Section: >>101139285
--DCLM's Standardized Corpus of 240T Tokens from Common Crawl: >>101135598
--Claude 3.5 Sonnet vs GPT4o: Model Comparison and Limitations: >>101135803 >>101135886 >>101135872 >>101141487
--Cambrian-1: A Vision-Centric Multimodal LLM for Enhanced Spatial Understanding in Text RP: >>101142603 >>101142681
--Research on Predictable Decision Making in LLMs by Siyan Zhao: >>101136382
--Open LLM Leaderboard Updates and Skepticism: >>101139019 >>101139036 >>101139045
--Llamafile 0.8 7 Released with Fixes and ARM Performance Boost: >>101140227 >>101141232
--LLMs' Reasoning Ability and Dataset Limitations in Character Counting Tasks: >>101134613 >>101134742 >>101134793 >>101135151 >>101140188 >>101140213 >>101140272 >>101140325 >>101140442 >>101140673 >>101140761 >>101140874
--Jamba Instruct Model Released on OpenRouter Platform: >>101137926
--Benchmark: PyTorch 55% Slower than llm.c for GPT-2 Training: >>101136766
--Miku (free space): >>101136681 >>101137085 >>101141355 >>101141139

►Recent Highlight Posts from the Previous Thread: >>101136593
>>
So whats the deal with DRY repetition penalty, I read about it, it combines tokens so it tries to prevent full phrases from repeating, instead of just single words or tokens like Rep penalty. Is it better? Does it work good? What range do you set the multiplier to?
>>
>>101144968
I guess it's over and Nvidia will soon close their regular GPU department. Why make 40/5090 when they can make more workstation cards?
>>
>>101145075
In theory it should work well exactly because it deals with ngrans instead of tokens, but I haven't had repetition issues in so long that I didn't even bother testing it.
>>
bitconnet... what went wrong...?
>>
>>101144942
thank you recap anon
>>
>>101145313
people always preferred bytes.
>>
>>101145313
too dangerous for our democracy.
>>
Is that Cambrian model relevant when there's chameleon? (and can chameleon now output images when finetuned, or not?)
>>
>>101145075
does it affect code/function calling/json outputs?
>>
>>101145560
Not Miku
>>
>>101145560
>>101145576
>>101145588
based
>>
>>101145560
>>101145576
>>101145588
I just don't get the race obsession.
>>
>>101146004
He's american tranny
>>
>>101146030
friendly fire
>>
Damn I was going to ask for opinions on the 4080 Super since it's now under $1k, but damn, what's the point when you can buy a used 3090 for $700.

Also, wholesome Migu.
>>
How the hell do you prompt CR+ in sillytavern, not talking about the system prompt, just the whole situation, can anyone give an example screenshot? Would really appreciate it. Because either my quant is too low, or ST's default example prompts for it are shitty, or both.
>>
File: 1711072659524103.jpg (811 KB, 2048x2048)
811 KB
811 KB JPG
>>101144935
>is is Teto Tuesday already?
>>
File: rag.png (21 KB, 593x439)
21 KB
21 KB PNG
Bros... is it now well and truly ogre for us? How do we compete with this?
>>
>>101145435
It is a mystery.
>>
>>101146848
Isn't this old? Or is this a new update and now it automatically puts your chats in memory?
>>
>>101146871
It's the first time i see this popup. Now in settings i can go to a "memory" page but it appears to be empty still, even after a bit of chatting. I'm not sure if this is actually RAG or if it's just some kind of system message injection.
>>
>>101146514
What exactly do you want to know if not the system prompt? You mean preset? Default silly preset is indeed bad, use this in Story String:
<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>
# System Preamble
You are a co-author, writing with me.
## Style Guide
You narrate for {{char}}.
{{#if system}}{{system}}
{{/if}}

## Additional information about {{char}}
{{#if wiBefore}}{{wiBefore}}
{{/if}}{{#if description}}{{description}}
{{/if}}{{#if personality}}### {{char}}'s personality:
{{personality}}
{{/if}}{{#if scenario}}### Scenario:
{{scenario}}
{{/if}}{{#if mesExamples}}### Example dialogue:
{{mesExamples}}
{{/if}}{{#if wiAfter}}{{wiAfter}}
{{/if}}

# User Preamble
I will be narrating for {{user}}.

{{#if persona}}## Additional information about {{user}}
{{persona}}
{{/if}}<|END_OF_TURN_TOKEN|>
>>
>>101146899
Ok this isn't new then. Weird that you're somehow just getting this now.
>>
>>101147002
maybe cause i'm a leaf
>>
>>101146923
>no safety preamble
>no expert roleplayer instructions
highly suspect
>>
>>101146923
Not him but thanks.
>>
I wanna finetune wizard 8x22 on limarp.
How much VRAM do I need and what should I use? Axolotl?
>>
File: 1695912428662624.png (22 KB, 778x290)
22 KB
22 KB PNG
>>101147110
depends
>>
>>101147110
>maybe if I throw more slop at this slopped model it will be less slopped
>>
>>101147151
>2400GB
Holy fuck dude...
>>
>>101147151
>2 bit qlora
wait is that a thing? How retarded are the results?
>>
File: What.jpg (159 KB, 2880x1406)
159 KB
159 KB JPG
What do they mean by this?
>>
>>101147165
It's fine, one epoch of 4bit qlora is totally enough to get money on ko-fi
>>
>>101147151
I can afford the machine for 8bit qlora, but - how would dataset size impact the requirement (would it?) and how long will it take?
>>
>>101147181
Yeah, very weird. What surprise could they have that they plaster it over a a leaderboard?
>>
>>101147175
Generally, model training size > bits.
A 2bit 8x22b model is generally going to out preform a 8bit 13B model. That said...
>>
>>101147181
Real surprise. With real confetti this time and with real tragic consequences.
>>
jameleonbyte-bitnet-MoE-MoA-MoM-MLA 600b when
>>
>>101147181
in 20 hours it will be magically transformed into an actual useful leaderboard
>>
>>101147286
>not the SuperCoT finetune
wake me up when
>>
>>101147299
So a Nala test leaderboard?
>>
>>101147181
I totally forgot this shit existed kek, now only chatbot arena is relevant
>>
>>101147336
I'd pay actual money for that.
>>
>>101147347
>he still thinks the chatbot arena is relevant
>>
>>101147299
kek
>>
>>101147064
>no safety preamble
Found it useless since it could say everything already without it.
>no expert roleplayer instructions
That's what system prompt is for. ({{#if system}}{{system}}{{/if}})
>>
Boxed my 3090s. I'm waiting for that architectural breakthrough because current gen is not good enough
>>
>>101147157
I want their length control thing. I really like it.
>>
>>101146848
Isn't this RAG?
>>
Are the new snapdragon laptops good for llm?
Does llama work on them?
>>
File: 1683495417317.png (136 KB, 542x476)
136 KB
136 KB PNG
>>
File: miku2061.png (1.2 MB, 832x1216)
1.2 MB
1.2 MB PNG
>>101146340
>Also, wholesome Migu.
Miku remembers those days fondly
>>
>P40s on ebay went from 200 to 330 euro in the span of 6 months
>Can now ACTUALLY buy 3090s for 550 to 750 euro

Okay, at this point it may actually be worth to buy a 3090 to accompany my 4070, instead of making a secondary server with 2 P40s
Is 36gb of vram a meme?
>>
>>101147829
what are the purple and green juices ion get it
>>
>>101147691
https://help.openai.com/en/articles/8590148-memory-faq

Not sure of the implementation details. For now it's not remembering anything for me.
>>
File: Dr-piccolo.png (656 KB, 1148x1411)
656 KB
656 KB PNG
>>101147930
Daily Dose
>>
File: 1467713503540.png (1.3 MB, 1054x1600)
1.3 MB
1.3 MB PNG
>>101147930
>>
>>101147909
Anything less than half a terabyte of VRAM is a meme, and even that will probably only futureproof you till the end of the year.
>>
>>101147829
this used to make me feel bad but repeatedly seeing it here has inoculated me to it, I think it's fine now
>>
File: IQ3-XXS.jpg (46 KB, 600x480)
46 KB
46 KB JPG
>>101148006
Terabyte?
You need at least 50 million gigaquads of capacity just to run an AI doctor, and forget about it having any bedside manner.
>>
>>101148052
Why did it make you feel bad? It represents blissful happiness.
>>
>>101148006
I just want to be happy with my AI waifus...
And fap with them
>>
>>101148098
I've never watched star trek but I appreciate it being used for a joke here.
>>
>>101148098
just remove all the bullshit like singing opera from his program it wont take that much
>>
>>101148184
Nonsense. What good is a doctor that can't sing opera?
>>
>>101147151
This table is completely useless without the batch size. But I guess it'd batch size 1 since it shows 7B as just 6GB.
>>
>>101147157
limarp isn't slop, it's peak fanfiction
>>
>>101148258
I've combed through LimaRP before. All the scalie/furry stuff is peak anthro. That doesn't teach the model anything useful.
>>
>>101146923
this entire thing is a system prompt >>101147419
>>
File: 00015-1664642145.png (1.83 MB, 1456x1024)
1.83 MB
1.83 MB PNG
>>101148308
>Implying a better pass rate for the Nala test isn't useful
>>
>>101148432
Nala needs human on feral training. LimaRP lacks feral training is what I'm saying.
There's like one human on feral dragon but it's straight up vore. The rest of the furry/scale stuff is just straight up anthro with no attempts made to describe anatomical interaction in any novel fashion
>>
>>101141232
Llama.cpp maintainers have decided that they don't want the server to be anything more than a reference implementation to help them test things.
>>
>>101148098
He's literally me btw.
>>
I'm pretty sure we're overdue for a good model release. Where is it?
>>
>>101148528
2MW
>>
>>101148528
Cohere are on it.
>>
>>101148528
llama 3.5 in july
>>
>>101147829
this is simultaneously gross and hilarious
>>
>>101148528
I'm on it
>>
>>101148137
A single 3090 running L3 8B at 8_0 or even fp16 is all you *reasonably* need. Beyond that, you're entering the slippery slope of "it's never quite enough" and "if only I had one more..."
>>
>>101148710
8B doesn't catch up to 70b before fp32 though
>>
>>101148710
>A single 3090 running L3 8B at 8_0 or even fp16
Coincidentally, I tried L3 8B at 8_0 and fp16 today, for RP. All it would do is babble incoherently and run on and on. Maybe it was because I tried abliterated (because last time I 8B'd it was coherent but low quality and balked at everything) but also not an enjoyable experience.
>>
>>101148528
no, it's over.
>>
>>101148756
ERPers should also consider model intelligence against reply speed. You might run 70B but have to wait 10-30 seconds for a reply, vs. nearly instant replies from 8B.
>>
>>101148756
>8B doesn't catch up to 70b before fp32 though
???? Huh? Am I missing some magical herculean leap in performance at full precision that would let 8B surpass 70B? We talking Q1 levels of brain damage on the 70B?

>>101148783
You DID get instruct, right? Not base llama 3?
>>
>>101148783
>babble incoherently
yeah, you did something wrong. I feel it's possible you're one of those retards that tweak the model alpha and forgets about it.
>>
https://x.com/Etched/status/1805625693113663834
>With over 500,000 tokens per second running Llama 70B, Sohu lets you build products that are impossible on GPUs. One 8xSohu server replaces 160 H100s.
>first specialized chip (ASIC) for transformer models
>>
>>101148826
I tried
>Llama-3-8B-Instruct-abliterated-fp16
>Llama-3-8B-Instruct-abliterated-q8_0
It wasn't like spewing compete nonsense, but it was like it was on a sugar high and throwing in lots of choppy short sentences and *asterisk crap* and running on for way too long while screwing up reference to events of the last exchange or which character it was supposed to be.

>>101148827
Oh, hi Mark. It's good to know that the requisite guy who waits for someone to have a problem and then calls him a retard while offering nothing constructive is back. Going most of the morning without that was so disquieting.
>>
>>101148867
Ok. How many kidneys do I need to harvest and sell to afford one?
>>
>>101148906
You just need tree fiddy.
>>
>>101148867
>One 8xSohu server replaces 160 H100s.
How many VRAM and what's the price of 8xSohu compared to 160 H100?
>>
>>101148937
https://www.etched.com/
Seems like the only information they've divulged so far is the purported t/s.
It's hype only because it's probably not far into development and I look forward to never hearing about it again.
>>
>>101148588
Damn, I thought it was June
>>
>>101149079
July is next week
>>
File: file.png (108 KB, 635x782)
108 KB
108 KB PNG
>>101148867
even if fake and gay, dedicated chips for ai meme is good, anything to kill nshittia.
>>
>>101148905
>while offering nothing constructive is back
I mentioned your alpha might be fucked, you mouth breathing ape
>>
>>101148528
mistral guys are going to drop a REALLY good open source model very soon
t. work for them
>>
>>101149155
These dedicated chip companies pop up every few months, make bold claims, suck up investor funds, then nothing ever comes of them.
>>
>>101149179
>Mistral
>really good open source model
This is too absurd even for a fic. Mistral is irrelevant nowadays.
>>
>>101149159
The only alpha I know about is the estimated rate of Type I Error. Is that somewhere in Kobold's settings, perhaps under a different name?
>>
>>101149218
All they have to do is use the same dataset they used on miqu on llama 3.
>>
>>101148756
8B doesn't catch up to 70b before fp64 though
>>
>>101149218
They just hit the $5B evaluation a while ago. Surely they're putting all the investor money into good use.
>>
>>101149277
lol
>>
>>101149277
They sure are! And all you have to do is pay a fee for the API to see the fruits of their labor :D
>>
>>101149179
I don't believe you. Mistral has lost it after being bought by microsoft. It will be full of safetyslop and even more positive than mistral, isn't it?
>>
>>101149245
Dunno that worked because their data was better than the llama2 data, might be different for llama3
>>
>>101148476
that's simply not true
>>
Does no one dislike cohere for not releasing a base model?
>>
>>101149413
Then why did they remove multimodal?
>>
>>101149420
no, not really.
>>
Does anyone here even care about multimodal? What's the use case?
>>
>>101149425
precisely because the stability of the server is more important that poorly implemented features like multi-modal
>>
>>101149277
Into slopping it to be the perfect safe AI assistant, maybe. These companies are basically only good for one model release, once they become successful and get bought out it's over.
>>
So anyone got a favorite model for writing long erotic story? I dont mean the goyslop, I mean the explicit erotic stories.

Preferably under 20B model
>>
>>101149442
>feed it an image of a UX
>Recreate this UI in html/css/js for me
>>
>>101147181
I'd guess the leaderboard broke because of a bug, and the surprise will be a useless improvement to their leaderboard
>>
>>101149442
it's the most important development. what do you do when you run out of text data? you have to find other sources of data (modalities). it will be the biggest factor in increasing model "intelligence". humans are trained on so many modalities, it has to be of the missing pieces.
>>
>>101149425
>>101149448
They are remaking the multimodal code from the ground up based on some other changes they made right?
>>
>>101148710
No, 8B is unusable. If I still had a single 3090, I would cope with MoE models. But getting a second one is definitely worth it.
How is every mikufag this retarded?
>>
>>101149502
more or less. the server needed a big refactor, and as part of that refactor the multi-modal support was removed because the implementation was not very good. the plan is to add it again in the future together with a big refactor of the multi modal model support. llama.cpp never really supported multi-modal models, it was added as an example using ggml to obtain the embeddings, but it was never part of the core llama.cpp library.
>>
>>101149502
No, they stripped it out because they thought they was a "cleaner" way to implement it. So instead of cleaning it up, they ripped out the feature entirely and left it like that for months now.
>>
>>101149567
you can always volunteer to clean it up yourself if the feature is important to you
>>
>>101149582
Maybe I would, but I know only know Python, not sepples.
>>
>>101148098
Based Doctor poster
>>
>>101149582
>working for free
>working for free without any guarantees that your work will be used
>working for free without any guarantees that your work will be used, for a private company (ggml.ai)
lol.
>>
>>101149610
maintaining your own fork with the changes you made yourself is free
>>
>>101149498
I mean the multimodal functionality itself, like do you really want to give it image inputs. Talking about how it might improve the llm capabilities, I'd hope so, but not convinced after seeing the first multimodal models, I was more hopeful for it before anyone tried it out.
>>
>>101149635
I would rather just use koboldcpp at this point. Or making something from scratch.
>>
>>101148905
Make sure if you are using SillyTavern that you have the latest presets https://huggingface.co/Virt-io/SillyTavern-Presets/tree/main/Prompts/LLAMA-3/v1.9

There's a lot of stuff that barely works for LLaMA3 and will dramatically lower the quality of your roleplay.
>>
>>101149650
>Or making something from scratch.
Yeah you do that man
>>
Would someone kindly leak Opus 3.5?
>>
>>101149666
Thanks, Satan.
I haven't gone so far as Silly Tavern. But I may need to. I'm doing strange things in Kobold's Arist's Notes interface.

I've been trying to find a way to get the AI to have kind of a meta conversation about its writing without actually interrupting the RP. And it is kinda working.

The problem is that it feels like I'm calling a 976 number, and sometimes I get someone who's worth the $4.99 a minute, and other times I get a moron.

Like, I had a really long RP run till I guess context overran enough that everything fell apart (though the AI apologized for assigning a previously encountered character's name to a new character; that's where I figured context was trashed), and then I did a post mortem and worked with it to revise my meta conversation stuff and it was feels good man.

Then I start a new RP with the same model, and it's mucking up the meta convo, only half following instructions, outright telling me that it's ignoring directives to make the RP easier to follow by citing the scenario (though since I provided the scenario in the first place).

But a reboot of Kobold and now it looks like I've got a partially useful instance going. (It's goofing directives but at least doing most of the meta right.)

Is there value in restarting Kobold after a while to clean out the bit buckets or is this placebo?
>>
>>101149420
They didn't release a base model? That sucks. I do dislike them more relative to other companies if that's true.
>>
is qwen 72b really old gpt4 level?
>>
>>101149910
I'd rather they leak 4o, personally. Imagine having its voice and image gen capabilities.
>>
>>101149937
>Old GPT-4 level
Thank god it's not just me. I dunno what the fuck they did to base 4, but it sucks giga ass now.
>>
>>101149638
you can send dick pics to your waifu, you can send her pics of herself so that she knows what she looks like, you can talk to her, she can moan for you, sing to you, she can generate pics of herself. there's a lot of possibilities. just think of all the things you'd do if you had a long distance relationship. most exchanges might be text, but there'd be a lot of other things.
>>
>>101149955
4o gens images?
>>
>>101150012
it can but they'll never let you use it because they hate fun
>>
>>101150012
Yes.
>>
>>101149420
Why? No one is doing jack shit with the base models we do have.
>>
>>101149676
I mean, make something from scratch while using llama.cpp as a library. I certainly don't think I can make everything from scratch.
>>
>>101149955
We can dream, but even if 4o leaked, no one would be able to run it, it probably wouldn't even be able to be quanted since it wouldn't be in the right format.
>>
>>101146923
Thank you, what I meant was things like System prompt prefix/suffix, user prompt prefix/suffix, etc, etc, but I also needed an improved story string, so thats very helpful. System prompt would be useful to. I have no experience with CR+ prompting, just the usual alpacca, vicuna, chatml, llama3.
>>
>>101150048
>it can but they'll never let you use it because they hate fun
why not? OpenAI already let us gen images with dalle3
>>
I just use the python llama.cpp library, why does no one seem to use it?
>>
>>101149963
but you kinda already can do all that, chaining multiple models together.
>>
>>101149471
oh no
saars...
what we do
it over
>>
>>101150187
it's a very janky experience, native MM makes it a lot better
>>
>>101150210
Is talking like a servile Indian man still funny on this board?
>>
>>101150100
python is bloat

you need 50+GB for garbage with no portability
>>
>>101150226
>saar stop talking bad about us indian sirs benchod!
aka "No Fun Allowed" police, you will never be a janny.
>>
>>101149910
Why doesn't Anthropic release their older models to public? Nobody would even care if Claude1 was leaked since we have much better models already. Or are they full of EA shit?
>>
>>101150099
It's too dangerous given how much better it is or they haven't found a way to reliably watermark it yet without ruining quality.
>>
>>101150267
or it's just bad
>>
>>101150262
Because they would gain nothing from doing that.
>>
>>101150226
this isn't reddit faggot, if you find this offensive maybe you should vent your frustrations to your wife's boyfriend, nigger
>>
is there a bigger difference between you and a panda or between you and GPT-4o?
>>
>>101150315
There is basically no difference between me and a sad panda.
>>
>>101150280
i think this is much more plausible. converting an image to latent and understanding vague attributes of it is a completely different ball park from rendering pixel-by-pixel and have it look good.
>>
>>101150225
assuming the MM model becomes sufficiently proficient in all of the modalities.

Using specialized models for each function will allow each to be optimized for its task, but it does require a rather sophisticated dispatcher module that glues it all together.

I couldn't say which i think holds the more promise in the long term.
>>
>>101150226
it will never not be funny, jeet
>>
What is bitnet and why should I care about it?
>>
>>101150297
Not every action has to be gainful, they could do it just as a gesture of goodwill to open source community.
>>
>>101150403
let's say, true unquantized 34B bitnet model on ~12 gb vram, smol size - same f16 or whatever precision.
>>
>>101150403
Bitnet is basically a transformers architecture, but the difference is that the weights are at 1.58bit instead of 16bits, and they realized that pretraining at 1.58bit gives the same accuracy as fp16, so basically we'll be eating really good with that one, just imagine a 90b bitnet that can be run with only a 24gb vram card that has the same accuracy as a fp16 90b transformers model
>>
ok i take it back
stheno 3.2 is retarded

mixtral 8x7b limarp zloss, i'm back...
>>
>>101150187
>>101150225
>>101150368
Patching different models together after the fact means a lot of information loss happens in the middle. The quality would suffer a lot. The reason 4o is so good at voice/image is because it's all native.

>>101150280
>>101150339
They have shown that it's a step above current dedicated models. Sure, they might've been cherry picked, but I don't think it's that unbelievable that it's true. We always knew that having multiple modalities would improve performance one day, but we just didn't have the right architecture to make it it work.
>>
>>101150403
Bitconnect was a cryptocurrency investment platform that operated from 2016 to 2018. It was ultimately exposed as a Ponzi scheme that defrauded investors of billions of dollars.

Key points about Bitconnect:

>The Scam: Bitconnect lured investors with promises of high daily returns through its "lending program." This program claimed to use a proprietary "trading bot" and "volatility software" to generate profits from cryptocurrency market volatility. However, there was no such technology, and the returns were paid out using funds from newer investors.
>The Collapse: In early 2018, Bitconnect shut down its platform and the value of its BCC token plummeted. Investors lost significant amounts of money, and many were left financially devastated.
>Legal Consequences: The founder of Bitconnect, Satish Kumbhani, was indicted on multiple charges, including wire fraud, conspiracy to commit wire fraud, operation of an unlicensed money transmitting business, and conspiracy to commit international money laundering. Several other promoters were also charged and convicted.
>Lessons Learned: The Bitconnect scandal serves as a cautionary tale for cryptocurrency investors, highlighting the importance of due diligence and skepticism towards promises of guaranteed high returns.
>>
File: 1716719286072843.png (583 KB, 918x916)
583 KB
583 KB PNG
>>101150443
>he fell for /lmg/ gaslighting
>>
>>101150452
So it's the solution to the stable diffusion dead end, which will revive /h/hdg?
>>
>>101150403
A paper showed you only need 3 bit of precision instead of 16 for a model to remember everything with no loss.
Which is great, but they trained on a small number of tokens, so it never needed that much precision to begin with.
It's like saying a one car garage can hold just as many cars as a 16 car garage, as long as you only have one car.
>>
>>101150059
Ah that makes more sense, I thought you were implying you would make EVERYTHING from scratch.
>>
>>101150486
Only if it gets released/leaked, though the memory requirements may or may not be out of reach for local.
>>
>>101150454
hey hey heyyyyyy
>>
>>101149179
After Codestral, they'll probably release Mistral-20B-Instruct, but I don't expect anything groundbreaking. Their instruct tunes have become increasingly more cucked and the format feels limited.
>>
>>101150487
>A paper showed you only need 3 bit of precision instead of 16 for a model to remember everything with no loss.
Bitnet is 1.58bit though, not 3bit
>>
>>101150458
i was one of the "shills"
it worked good to some extent, but alas, it flopped hard at some point and subsequently became unusable. Q6_K Mixtral saved the day without a hitch.
>>
>>101150417
the open source community wouldn't give them any money, and guess what, everything companies do are with profit in mind because that's what allow them to make new and better models.
Releasing their old models wouldn't be free either, I bet they would need to sort out bureaucracy, pay someone to write the blog posts and etc...
>>
>>101149179
their latest good open model from them was Mixtral and it was 9 months ago, it better be some good shit anon
>>
>>101150553
A paper showed you only need 1.58bit of precision instead of 16 for a model to remember everything with no loss.
Which is great, but they trained on a small number of tokens, so it never needed that much precision to begin with.
It's like saying a one car garage can hold just as many cars as a 16 car garage, as long as you only have one car.
>>
>>101150563
just have some anon "oops i dropped my claude weights all over the place teehee*
>>
File: 1610351662756.jpg (46 KB, 1024x580)
46 KB
46 KB JPG
>>101150536
I'm still not tired of this meme.
>>
>>101150454
>Bit(((con)))Net
>>
>>101150570
8x22 probably cost them more, I hope they just continue training llama3 or qwen2 with a magical recipe
>>
>>101150593
everything with "bit" it its name is doomed to be forever associated with some tainted shady shit at this point

bitnet
bitcoin
bittorrent
>>
at which point am I allowed to say "llama 4 when"?
>>
>>101150621
llama is a dead end. you know it's going to be bad when ylecunn has given up on llms and is publicly shitting on them at any given opportunity
>>
>>101150621
when timeToRelease === 2MW
>>
>>101150636
Llama 4 could be a LMM though.
>>
>>101150636
>be meta
>gimp your models so they don't say no no words about *any protected group of freaks & schizos* in 2024
>given the architecture and nature of LLMs - final model performs very bad
wow!
>>
>>101150636
I think we are still far from a dead end, but we will never get AGI from LLMs. I don't need AGI though, I would be happy with 3.5 Sonnet @home.
>>
>>101150665
who would've thought that lobotomizing a model to not recognize certain pattern would make it dumber overall, me am SHOKED
>>
How long did the qwen team take between releases? 1.5 and 2 I guess?
>>
wait so sillytavern was made by the company that trained command r+?? how the fuck?
>>
>>101150712
1.5 to 2 was about 4 months, but I would not draw any conclusions from that. the amount of time that goes into new models depends on a lot of variables
>>101150784
cohee != cohere, kek
>>
name sounds like someone with lisp saying coffee
>>
File: file.png (10 KB, 289x96)
10 KB
10 KB PNG
>>101150826
fuck
>>
>>101150784
Technically SillyTavern is just a fork of Tavern, it started as a patch for OpenAI support, which was made by anons.
>>
>>101150784
it was actually trained by a popular mid-2000s prog rock band.
>>
>>101150665
Unironically not their fault. "People" shat on them for releasing Galactica because it "spewed misinformation and racism". It's a miracle they even still release base models. This isn't the same as a small company like Mistral releasing a relatively uncensored thing, since they're nobodies. Maybe one day the cost of training will be low enough that anyone can train huge models, but for now it's only the ones with money (that have to abide by investors and public scrutiny).
>>
>>101150863
They give us the base models trained for intelligence. We can train the smut, copyrighted materials, and FBI statistics back in if we want. But the people with the resources and interest only bother to train braindead 1 epoch loras on gptslop logs.
>>
>>101150915
now with that new magpie paper, if true, we'll get much better models when training on gpt
>>
File: PXL_20240625_310830628.jpg (746 KB, 1498x1436)
746 KB
746 KB JPG
oh hai /lmg/
i haz boxes
halp me unpack?
>>
>>101151211
*touches box*
>>
>>101151211
*sniiiiiiiiif*
>>
>>101151211
*shits on your box*
>>
>>101151255
*eats it*
>>
Any cards that do interesting experimental prompt stuff? I just found a card where they use the lorebook feature to insert information depending on the "Day" stat. I want to see more stuff like this.
>>
>>101151211
*bites lower lip, thinking about the journey ahead, eyes sparkling with anticipation*
>>
File: PXL_20240625_313131315.jpg (541 KB, 1401x1227)
541 KB
541 KB JPG
>>101151219
>>101151249
>>101151255
>>101151285
omg it is migu
looks like she had a rough trip
>>
File: tet_tunic.png (2.85 MB, 1328x1992)
2.85 MB
2.85 MB PNG
>>101151270
This may be of interest to you:
https://github.com/ThiagoRibas-dev/SillyTavern-State/
> The extension allows the user to configure a number of prompts that are automatically sent after the AI's response to the User's prompt, adding the result of each prompt as an individual message to the chat, as a form of persistent context that gets update after each turn
>>
File: ComfyUI_00692_.png (1.17 MB, 832x1216)
1.17 MB
1.17 MB PNG
>>101151336
omg it is piku
>>
Has anyone ever tried using RAG for T2T generation? Basically I have a dataset of sentences and I'd like to rewrite them a particular way, notably changing certain words (but this implies making other types of modifications in the sentence in my language, for example in terms of number or gender). I thought that by having some RAG database the system can rely on to find the closest sentence structure, it could help with better generation. Actually you can consider my task as close to a translation task. I tried searching for RAG T2T but it doesn't seem very popular right now. Any ideas?
>>
>>101151348
Interesting, but are there any cards that use this to do unique things that aren't just stat tracking?
>>
>>101151398
I really doubt you're going the get good results that way. Embeddings prioritize content over grammar. You'll likely be frustrated with the match distances you'll get.
Why not try it? Shouldn't take that long to implement a test and see for yourself if the results are good enough.
>>
File: PXL_20240625_212137575.jpg (524 KB, 1908x1197)
524 KB
524 KB JPG
>>101151356
fuck. more boxes
this will take longer than i thought
>>
File: ComfyUI_00142_.png (875 KB, 1024x1024)
875 KB
875 KB PNG
>>101151642
why did it take you so long to open the box..
>>
File: 20240625_180100.jpg (85 KB, 800x550)
85 KB
85 KB JPG
>>101151336
oh hi there
>>
>>101151662
Cool Gardevoir plushies.
Regular and shiny!
>>
File: PXL_20240625_220646464.jpg (895 KB, 1989x1369)
895 KB
895 KB JPG
>>101151657
i've been drinking plz understand
>>101151662
mounty thingy got bent. i guess they couldn't be bothered to invest in $5 in packing foam
came with a couple ancient M60s. guess i can sell them or something
>>
>>101151776
so anon what did you order
>>
File: PXL_20240625_220707121.jpg (794 KB, 1435x1679)
794 KB
794 KB JPG
>>101151827
i think it's a computer
picrel is an nvlink sxm board
>>
>>101145313
>BITCONNEEEEEEEECT
>>
>>101149094
Yeah, that why I was hyped for this week.
>>
>>101150454
>>101150536
>>101150587
>>101150593
Fuck I'm late to the party.
>>
File: ComfyUI_00343_.png (1.83 MB, 1024x1024)
1.83 MB
1.83 MB PNG
>>101151859
ru sure u should be opening such valuable items when drunk
>>
File: PXL_20240625_222408282.jpg (643 KB, 1400x1484)
643 KB
643 KB JPG
>>101151902
no but what's the worst that could happen?
>>
>>101151976
thats a lot of stuff anon, how did you acquire that box
>>
>>101152022
i found it. dont' worry about it
>>
>>101152046
k im gonna find u then....
>>
>>101152046
...worry about it
>>
>>101151976
Cutting ribbon cables with Miku
>>
File: PXL_20240625_231559779.jpg (1011 KB, 1977x1205)
1011 KB
1011 KB JPG
>>101152052
>>101152115
>>101152188
uWu wut r u going to do when u find me?
>>
File: Capture.png (74 KB, 1296x1011)
74 KB
74 KB PNG
>>101152617
>>
>>101152625
So now we have 2 V100 max anons?
>>
>>101152625
This is good.
>>
>>101152625
>32GB
nice. did you get a good deal for those? seems hard to justify doing now if not since prices will crash next year as datacenters dump them
>>
File: PXL_20240625_233002121.jpg (797 KB, 1513x1569)
797 KB
797 KB JPG
>>101152670
>>101152697
>>101152712
ok meta. i'm ready for 405b
>>
>>101152771
Miku, Guardian of Volta
>>
>>101151412
You could use the prompts to have the model output specific information that can trigger lorebook entries.
The actual point of that extension is simply to lessen the burden on the model by feeding instructions one (or a couple) at a time, since too many instructions confuse smaller models and make them extra dumb.
I will implement a keyword feature, similar to lorebooks, so that these prompts can be triggered conditionally.
>>
Smaug is retarded, every version of it is always retarded and much dumber than whatever model it was based on, and yet mergers always keep including it in their mixes for some reason
>>
>>101152617
i'm going to get behind you, put my hand over your mouth.. and then you'll fall asleep because over my hand there was a cloth
after that i'm going to undress....

..undress the rig and steal all the parts
>>
>>101152958
>smaug is retarded
>mergers are retarded
it's like poetry
>>
>>101152771
are you going to run it in 2bit or what?
>>
File: 1715277591317631.jpg (1.27 MB, 2048x2048)
1.27 MB
1.27 MB JPG
>>101151356
Those hands are god-tier for SD. What model/workflow?
>>101152625
SHEEEEEEEEEEEIT
Finally someone itt with moar VRAM than me
>>
Where can I find a slop-free RP dataset?
>>
>>101153232
LeCun wants AI to be more than just LLMs. Maybe even until they have conscious. Imagine, your local Miku having a real consciousness. She'll finally be real, not just a mimicry.
>>
>>101151336
embarrassing manchild
>>
>>101153265
limarp
>>
File: 1705326754733957.jpg (237 KB, 1920x1080)
237 KB
237 KB JPG
Any kind soul that could recommend a TTS to make Neco-arc read my unending backlist of papers?
>>
So are there any bitnet/1.58bpw models available to run with significant numbers of parameters? I have 32gb vram, i keep hearing about this shit but the only models i've seen are teeny.
>>
>>101153282
you're posting in local manchild general
>>
what would you do if you had like $100,000 to spend on hardware?
spoke to higher ups today about the benefits of hosting our own server versus renting time on someone else's. if i can make a good argument i can probably get some money diverted.
>>
>>101153284
>70% furry and 30% loli
damn
>>
>>101153458
>% totals to only 100%.
Have they been slacking or is there only space for one tag at a time?
>>
>>101153444
What are your requirements? For that much you could probably build with 2xH100 for about 160GB total VRAM.
>>
>>101153444
Used consumer or server hardware, for example 30-50 of 4-6x3090 or 4xv100 machines. But that stuff isn't supported or maintained, so not something your company would buy, also the power bill would be hilarious, but imagine 100-130 3090s, just 2.4TB of VRAM? if you had GPT-4 weights you could even run it! Of course the interconnect and networking will kinda suck, but depends on what you need...
>>
File: 1702200312013572.png (31 KB, 897x378)
31 KB
31 KB PNG
Hey friends, where do I add these things? Is it under "Story String"? Instruct Mode Sequences have similar things written on it but they're separated and slightly different.
>>
>>101153444
>>101153563
H100s don't make sense unless you're filling a datacenter with them, I would put together an A100 rack, and if that doesn't work out i would just be like "let's buy a bunch of quadros/4090s"sdjv
>>
File: ComfyUI_00690_.png (1.18 MB, 832x1216)
1.18 MB
1.18 MB PNG
>>101153232
>Those hands are god-tier for SD. What model
autismmix ( https://civitai.com/models/288584?modelVersionId=324619 ) ( has ponyxl as base (ponyxl is good) )
>/workflow?
here's anon's workflow (better besides hair color)
https://files.catbox.moe/5y0e12.png
in my workflow im using tensorrt and no loras, nothing special really
>>
>>101153691
It's in the instruct mode sequences.
Silly Tavern already has the template built in if you are using that.
>>
File: 1703934839568504.png (173 KB, 1866x631)
173 KB
173 KB PNG
>>101153722
It's slightly confusing because I don't really understand the correct place I should be putting each line in.
Left is the original, middle is the one I've modified, right are the instructions.
>>
>>101153563
chemical manufacturing.
proposals they like are stuff like processing and categorizing like 30 years of documents and data.
some sort of internal tool that could parse them and pluck insights out on demand.

even that was something they were really excited about and i don't think we'd need an unbelievably beefy machine to do it, but they're open to the idea and it'd be sweet to get to fuck around with serious hardware.

i figure with that sort of compute you could probably explore forecasting and anomaly detection for production processes. not really LLM but just a secondary benefit of a dedicated server. there is a shitload of real time data (temperature, flowrates, pressure, etc).

we have a couple 4090s but there's only so much you can do. i'm kind of secondary to the group who is doing this. i'm doing more machine learning stuff but we work together.
>>
>https://websim.ai/c/R6ochh0wCk3sLl40D
Huh...
>>
>>101153769
The one in the left is already correct according to the instructions.
>>
>>101153836
Ah...okay...I apologize for the dumb question...
>>
lole
https://websim.ai/c/R6ochh0wCk3sLl40D
>>
Thread theme anon made it in! A shame about it thinking we'd be safetyfags though.
https://websim.ai/c/R6ochh0wCk3sLl40D
>>
>>101153894
>sign in
No.
>>
>>101153935
You should be able to see the links fine. Just don't click anything, that triggers a log in screen.
>>
>>101153813
>>101153847
>>101153894
Oh I'm retarded, these are the same links.
>>
>>101153956
No.
>>101153984
Yes.
>>
>>101153847
Intended URL: https://websim.ai/c/bA64LoXlbn3vs2u2M

>>101153894
Intended URL: https://websim.ai/c/578BMgWKq5HmYcp7a
>>
>>101154001
Just having a laugh playing with this my man. You can do what you want.
>>
>>101154040
ugh fine ill let you play with it
>>
>>101153844
It's alright.
Look at the final prompt either in the browser's console or in the backend window to see how the prompt template is actually being used. That'll help you understand how those fields are being applied.
>>
>>101154010
lol, nice
>>
>>101144935
I've been under a rock, is Midnight Miqu still queen of the 32k context 70B models?
>>
https://github.com/beowolx/rensa
>>
>>101150443
>>101150558
>the tiny 8b model doesn't outperform mixtral, therefor its garbage
are people really this retarded?
>>
File: -.png (8 KB, 472x80)
8 KB
8 KB PNG
>enable dry
>doesn't show up in ui
wat do
>>
>>101154308
Meta claimed 8B beat previous generation 70B. So surely it can beat ~42B Mixtral.
>>
>>101154278
>MIT
go advertise your shitty side project somewhere else
>>
>>101154341
Oh yeah, meta's claims were absurd. But its still a lot better than any 7b models we had before.
>>
>>101154341
Meta said that to generate hype, obviously that's pure cope.
>>
>>101148241
Huh? This is minimum hardware required. Of course it’s batch size 1 retard.
>>
>>101154384
bullshit, everyone here was running "hurr durr this 8B model is GPT-4 killer!!!" first weeks after llama3 release.
>>
File: file.png (113 KB, 1184x747)
113 KB
113 KB PNG
>>101154341
check picrel, llama-2-chat tunes were shit, remember anon? oh wait you're a newfag~
>>101154418
[citation needed]
>>
>>101154418
what other things do the voices in your head tell you?
>>
File: MikuAten.png (1.56 MB, 832x1224)
1.56 MB
1.56 MB PNG
>>101154182
>Midnight Miqu
No. Solar Eclipse Miqu is the new sota
>>
>>101154418
I'll take "things that never happened" for 500
>>
>>101154341
They never said that. What Zucc said was that it's pretty close but not in every aspect.

Also Mixtral beat 70B previously, according to anons, so it makes sense that an 8B that almost but not quite old 70B still does not beat Mixtral.
>>
>>101154423
>newfag
>for some obscure general with extremely low activity no one knows and cares about
you for sure got him! /s
>>
>>101154453
>/s
go back
>>
https://x.com/brave/status/1805781843393773654

Mistral Exec says they wont release Mistral Large due to business responsibilities preceding over openness..
>>
File: 1569991762929.jpg (93 KB, 874x612)
93 KB
93 KB JPG
>>101154434
>>
>>101154462
>no mention of mistral medium or next
>>
>>101154462
Nothing wrong with that. Just their early marketing before they got acquired that was the issue. Using "open source" to hype themselves up and then close things off later. Typical.
>>
>>101154453
based 2025oldGOD destroying clueless newfags
>>
>>101134899
>>101127795
I look forward to seeing the results of this (different anon here catching up on threads).
>>
>>101154406
That wasn't stated neither in your post nor in that image, retard.
>>
File: file.png (2.24 MB, 1430x1448)
2.24 MB
2.24 MB PNG
>https://huggingface.co/Sao10K/L3-8B-Stheno-v3.3-32K
b-bros..?
>>
>>101154707
people say it's a downgrade from 3.2
>>
>>101154707
i got better results from euterpe in 2021 than any l3 8b model, just take the tokens and run cr/mixtral if you're poor
>>
>>101154752
>mixtral
mixtral limarp zloss eh?
>>
>>101154453
>/s
anon...
>>
>>101154468
based Chambraigne
>>
>>101154782
lol /s
>>
>>101154782
i don't care about it being used by leddit exclusively.
>>
>Only getting ~0.8 t/s on CR+ GGUF.

Sorry, what's holding back CPU inference speed? RAM frequency or CPU clock speed? Cause AMD Ryzen 9000 series is out next month. If it helps t/s to upgrade, I would do it.
>>
>>101154877
Maybe if you defect to the llamafile camp you will get better t/s with AVX-512
>>
>>101154877
even cr non-plus is glacial compared to 70b for me
>>
>>101154877
Why CPU over p40?
>>
>>101154883
NTA, but why is llamafile faster, did jart add some custom AVX-512 optimizations, if so, why wouldn't llama.cpp bother adding them?
>>
>>101154900
llamafile has a shit license and conflicts with MIT. He contributed some bits to llama.cpp, but only so that he doesn't have to keep patching it on his side.
>>
File: .png (389 KB, 918x916)
389 KB
389 KB PNG
>>101154931
>MIT cucks BTFO'D by tranny
>>
File: 8f8f8.u3.jpg (28 KB, 600x600)
28 KB
28 KB JPG
>>101154900
Check it anon:
https://github.com/Mozilla-Ocho/llamafile/pull/464
https://github.com/Mozilla-Ocho/llamafile/pull/453
And one for MOE & AVX2:
https://github.com/Mozilla-Ocho/llamafile/pull/428
>>
>>101154943
*GPL licenses are a nightmare to read. Just like their list of pronouns and mental disorders.
>>
>>101154877
That's what I get.
I just do other things while it runs.
It's kinda like RP with an actual person who also has to type and live life.
>>
>>101154931
>implying MIT itself isnt shit
>>
>>101154985
>no warranty
>keep copyright
Everyone can use it. That's it.
>>
>>101154960
rent free
>>
File: IMG_1488.png (367 KB, 1055x896)
367 KB
367 KB PNG
>>101155012
>Everyone can use it. That's it.
>>
>>101155059
How is that false?
>>
File: bingo.png (152 KB, 498x402)
152 KB
152 KB PNG
>>101155012
>>
the licensesperg really doesn't stop. sign of autism.
>>
>>101155012
you can even fork a MIT program to whatever troon license you want as our lovely Jart did indeed do. only nocoders really give a fuck though I've noticed
>>
File: retard.png (301 KB, 668x735)
301 KB
301 KB PNG
>>101155078
>How is that false?
>>
>the sharteen and jart are ideological allies
grim
>>
File: ACK.jpg (132 KB, 760x704)
132 KB
132 KB JPG
>>101155223
>implying
>>
>>101155233
no I was being literal. both you and jart chimp out whenever MIT licenses show up
>>
>>101155223
agpl>apache THOVGH
>>
>>101155253
whatever license you simp over doesn't matter when no one uses whatever code you write THOUGH
>>
File: thats the point.png (239 KB, 498x402)
239 KB
239 KB PNG
>>101155268
>when no one uses whatever code you write
>>
File: Untitled.png (552 KB, 720x915)
552 KB
552 KB PNG
Large Language Models are Interpretable Learners
https://arxiv.org/abs/2406.17224
>The trade-off between expressiveness and interpretability remains a core challenge when building human-centric predictive models for classification and decision-making. While symbolic rules offer interpretability, they often lack expressiveness, whereas neural networks excel in performance but are known for being black boxes. In this paper, we show a combination of Large Language Models (LLMs) and symbolic programs can bridge this gap. In the proposed LLM-based Symbolic Programs (LSPs), the pretrained LLM with natural language prompts provides a massive set of interpretable modules that can transform raw input into natural language concepts. Symbolic programs then integrate these modules into an interpretable decision rule. To train LSPs, we develop a divide-and-conquer approach to incrementally build the program from scratch, where the learning process of each step is guided by LLMs. To evaluate the effectiveness of LSPs in extracting interpretable and accurate knowledge from data, we introduce IL-Bench, a collection of diverse tasks, including both synthetic and real-world scenarios across different modalities. Empirical results demonstrate LSP's superior performance compared to traditional neurosymbolic programs and vanilla automatic prompt tuning methods. Moreover, as the knowledge learned by LSP is a combination of natural language descriptions and symbolic rules, it is easily transferable to humans (interpretable), and other LLMs, and generalizes well to out-of-distribution samples.
Mikunator
>>
>>101144935
>https://github.com/OpenBMB/llama.cpp?tab=readme-ov-file#run-the-quantized-model
for:
>for openbmb/MiniCPM-Llama3-V-2_5-gguf/ggml-model-Q4_K.gguf?
which damn file do I use and where is the help output? --help just gives:
./llama-gguf --help
./llama-gguf: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by ./llama-gguf)

the folder is full of shit:
llama-b3209-bin-ubuntu-x64/build/bin$ ls
LICENSE llama-q8dot
llama-baby-llama llama-quantize
llama-batched llama-quantize-stats
llama-batched-bench llama-retrieval
llama-bench llama-save-load-state
llama-bench-matmult llama-server
llama-cli llama-simple
... tl;dr
...
llama-lookup-stats test-sampling
llama-parallel test-tokenizer-0
llama-passkey test-tokenizer-1-bpe


>[2024 Jun 12] Binaries have been renamed w/ a llama- prefix. main is now llama-cli, server is llama-server, etc (ggerganov#7809)
what the fuck does all this mean? Last time I used llama ccp was when it first came out on windows and now I'm trying to run multi modal on ubuntu and its nothing like
I know I'm retarded. Please just tell me which button to press. is it llama-cli or llama-gguf for openbmb/MiniCPM-Llama3-V-2_5-gguf/ggml-model-Q4_K.gguf?
>>
File: 1472860069099.png (191 KB, 600x979)
191 KB
191 KB PNG
The girl: My GPU (8gb vram)
The burger: Models that won't fit exclusively on my GPU

Someone who is good at eating burgers please advise.
>>
>>101155630
cpumaxx
>>
File: Untitled.png (722 KB, 1166x901)
722 KB
722 KB PNG
Retrieval-Augmented Mixture of LoRA Experts for Uploadable Machine Learning
https://arxiv.org/abs/2406.16989
>Low-Rank Adaptation (LoRA) offers an efficient way to fine-tune large language models (LLMs). Its modular and plug-and-play nature allows the integration of various domain-specific LoRAs, enhancing LLM capabilities. Open-source platforms like Huggingface and Modelscope have introduced a new computational paradigm, Uploadable Machine Learning (UML). In UML, contributors use decentralized data to train specialized adapters, which are then uploaded to a central platform to improve LLMs. This platform uses these domain-specific adapters to handle mixed-task requests requiring personalized service. Previous research on LoRA composition either focuses on specific tasks or fixes the LoRA selection during training. However, in UML, the pool of LoRAs is dynamically updated with new uploads, requiring a generalizable selection mechanism for unseen LoRAs. Additionally, the mixed-task nature of downstream requests necessitates personalized services. To address these challenges, we propose Retrieval-Augmented Mixture of LoRA Experts (RAMoLE), a framework that adaptively retrieves and composes multiple LoRAs based on input prompts. RAMoLE has three main components: LoraRetriever for identifying and retrieving relevant LoRAs, an on-the-fly MoLE mechanism for coordinating the retrieved LoRAs, and efficient batch inference for handling heterogeneous requests. Experimental results show that RAMoLE consistently outperforms baselines, highlighting its effectiveness and scalability.
No code. I remember some anons wanting something like this. there was a prior similar paper (that they cited but didn't test against it seems) https://arxiv.org/abs/2404.13628
>>
>>101155659
I can't.
>>
>>101155630
Get more RAM.
Can you fit 64gb?
Then you can mixtral at least until bitnet
>>
>openbmb/MiniCPM-Llama3-V-2_5-gguf
how can I run this multi modal modal?
>>
>>101155709
>at least until bitnet
Why is it taking so long?
>>
>>101155736
Money and risk.
>>
> “I need 2400 gb vram? damn. Can I get away with less?” “Of course just stop the batch size from 1024 to 1 and you only need 10 gb”
Retard.
>>
>>101155630
Koboldcpp or llama.cpp running a gguf quant with some layers on cpu. Assuming you have regular ram.
>>
File: Untitled.png (119 KB, 1033x793)
119 KB
119 KB PNG
Interpreting Attention Layer Outputs with Sparse Autoencoders
https://arxiv.org/abs/2406.17759
>Decomposing model activations into interpretable components is a key open problem in mechanistic interpretability. Sparse autoencoders (SAEs) are a popular method for decomposing the internal activations of trained transformers into sparse, interpretable features, and have been applied to MLP layers and the residual stream. In this work we train SAEs on attention layer outputs and show that also here SAEs find a sparse, interpretable decomposition. We demonstrate this on transformers from several model families and up to 2B parameters. We perform a qualitative study of the features computed by attention layers, and find multiple families: long-range context, short-range context and induction features. We qualitatively study the role of every head in GPT-2 Small, and estimate that at least 90% of the heads are polysemantic, i.e. have multiple unrelated roles. Further, we show that Sparse Autoencoders are a useful tool that enable researchers to explain model behavior in greater detail than prior work. For example, we explore the mystery of why models have so many seemingly redundant induction heads, use SAEs to motivate the hypothesis that some are long-prefix whereas others are short-prefix, and confirm this with more rigorous analysis. We use our SAEs to analyze the computation performed by the Indirect Object Identification circuit (Wang et al.), validating that the SAEs find causally meaningful intermediate variables, and deepening our understanding of the semantics of the circuit. We open-source the trained SAEs and a tool for exploring arbitrary prompts through the lens of Attention Output SAEs.
https://robertzk.github.io/circuit-explorer
weights linked in appendix. probably only interesting for those who want to poke around
>>
File: 4871575.jpg (6 KB, 150x150)
6 KB
6 KB JPG
>>101155841
>assuming the oldfriend cute chibi vramlet burger chan poster doesn't know about ggufs
>>
Is there a reason not to get an a6000 for training? Seems like a decent upgrade from 3090.
>>
>>101155940
>>101155940
>>101155940
>>
>>101155922
One day I'll get a job and buy a new computer. You'll see! (I won't though)
>>
>>101154462
I mean, they're a small company, they can't risk giving their best model for everyone for free, look what happened to StabilityAI, they are on the verge of bankruptcy because of that
>>
>>101155041
you hated him because he told the truth
>>
File: file.png (159 KB, 600x600)
159 KB
159 KB PNG
>>101156300



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.