[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Settings Mobile Home
/g/ - Technology

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101134566 & >>101125756

>(06/25) Cambrian-1: Collection of vision-centric multimodal LLMs: https://cambrian-mllm.github.io
>(06/23) Support for BitnetForCausalLM merged: https://github.com/ggerganov/llama.cpp/pull/7931
>(06/18) Meta Research releases multimodal 34B, audio, and multi-token prediction models: https://ai.meta.com/blog/meta-fair-research-new-releases
>(06/17) DeepSeekCoder-V2 released with 236B & 16B MoEs: https://github.com/deepseek-ai/DeepSeek-Coder-V2
>(06/14) Nemotron-4-340B: Dense model designed for synthetic data generation: https://hf.co/nvidia/Nemotron-4-340B-Instruct

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started

►Further Learning

Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
File: tet.jpg (44 KB, 637x358)
44 KB
►Recent Highlights from the Previous Thread: >>101134566

--Paper: Adam-mini: Use Fewer Learning Rates To Gain More: >>101141337 >>101141838
--Papers: >>101140697 >>101140766 >>101140609 >>101140655 >>101140878 >>101141988 >>101140729
--Template for AI-Powered Person with Human-Like Interactions: >>101141420 >>101141500
--Voice Synth After Elevenlabs' Changes: New Projects and Challenges: >>101144328 >>101144357 >>101144434 >>101144650 >>101144660 >>101144808
--Vectordb-ing Wikipedia for Efficient Querying and Embedding Archives: >>101141307 >>101141318 >>101141329 >>101141327 >>101143811
--Probllama: Ollama Remote Code Execution Vulnerability (CVE-2024-376032) – Overview and Mitigations: >>101134926 >>101135029
--Overclocking A6000 Memory for Performance Boost: >>101138329 >>101138358 >>101138445 >>101138570 >>101138599 >>101138640
--Hypothesis for Improving Character/System Prompt Following with Stat Tracking Section: >>101139285
--DCLM's Standardized Corpus of 240T Tokens from Common Crawl: >>101135598
--Claude 3.5 Sonnet vs GPT4o: Model Comparison and Limitations: >>101135803 >>101135886 >>101135872 >>101141487
--Cambrian-1: A Vision-Centric Multimodal LLM for Enhanced Spatial Understanding in Text RP: >>101142603 >>101142681
--Research on Predictable Decision Making in LLMs by Siyan Zhao: >>101136382
--Open LLM Leaderboard Updates and Skepticism: >>101139019 >>101139036 >>101139045
--Llamafile 0.8 7 Released with Fixes and ARM Performance Boost: >>101140227 >>101141232
--LLMs' Reasoning Ability and Dataset Limitations in Character Counting Tasks: >>101134613 >>101134742 >>101134793 >>101135151 >>101140188 >>101140213 >>101140272 >>101140325 >>101140442 >>101140673 >>101140761 >>101140874
--Jamba Instruct Model Released on OpenRouter Platform: >>101137926
--Benchmark: PyTorch 55% Slower than llm.c for GPT-2 Training: >>101136766
--Miku (free space): >>101136681 >>101137085 >>101141355 >>101141139

►Recent Highlight Posts from the Previous Thread: >>101136593
So whats the deal with DRY repetition penalty, I read about it, it combines tokens so it tries to prevent full phrases from repeating, instead of just single words or tokens like Rep penalty. Is it better? Does it work good? What range do you set the multiplier to?
I guess it's over and Nvidia will soon close their regular GPU department. Why make 40/5090 when they can make more workstation cards?
In theory it should work well exactly because it deals with ngrans instead of tokens, but I haven't had repetition issues in so long that I didn't even bother testing it.
bitconnet... what went wrong...?
thank you recap anon
people always preferred bytes.
too dangerous for our democracy.
Is that Cambrian model relevant when there's chameleon? (and can chameleon now output images when finetuned, or not?)
does it affect code/function calling/json outputs?
Not Miku
I just don't get the race obsession.
He's american tranny
friendly fire
Damn I was going to ask for opinions on the 4080 Super since it's now under $1k, but damn, what's the point when you can buy a used 3090 for $700.

Also, wholesome Migu.
How the hell do you prompt CR+ in sillytavern, not talking about the system prompt, just the whole situation, can anyone give an example screenshot? Would really appreciate it. Because either my quant is too low, or ST's default example prompts for it are shitty, or both.
File: 1711072659524103.jpg (811 KB, 2048x2048)
811 KB
811 KB JPG
>is is Teto Tuesday already?
File: rag.png (21 KB, 593x439)
21 KB
Bros... is it now well and truly ogre for us? How do we compete with this?
It is a mystery.
Isn't this old? Or is this a new update and now it automatically puts your chats in memory?
It's the first time i see this popup. Now in settings i can go to a "memory" page but it appears to be empty still, even after a bit of chatting. I'm not sure if this is actually RAG or if it's just some kind of system message injection.
What exactly do you want to know if not the system prompt? You mean preset? Default silly preset is indeed bad, use this in Story String:
# System Preamble
You are a co-author, writing with me.
## Style Guide
You narrate for {{char}}.
{{#if system}}{{system}}

## Additional information about {{char}}
{{#if wiBefore}}{{wiBefore}}
{{/if}}{{#if description}}{{description}}
{{/if}}{{#if personality}}### {{char}}'s personality:
{{/if}}{{#if scenario}}### Scenario:
{{/if}}{{#if mesExamples}}### Example dialogue:
{{/if}}{{#if wiAfter}}{{wiAfter}}

# User Preamble
I will be narrating for {{user}}.

{{#if persona}}## Additional information about {{user}}
Ok this isn't new then. Weird that you're somehow just getting this now.
maybe cause i'm a leaf
>no safety preamble
>no expert roleplayer instructions
highly suspect
Not him but thanks.
I wanna finetune wizard 8x22 on limarp.
How much VRAM do I need and what should I use? Axolotl?
File: 1695912428662624.png (22 KB, 778x290)
22 KB
>maybe if I throw more slop at this slopped model it will be less slopped
Holy fuck dude...
>2 bit qlora
wait is that a thing? How retarded are the results?
File: What.jpg (159 KB, 2880x1406)
159 KB
159 KB JPG
What do they mean by this?
It's fine, one epoch of 4bit qlora is totally enough to get money on ko-fi
I can afford the machine for 8bit qlora, but - how would dataset size impact the requirement (would it?) and how long will it take?
Yeah, very weird. What surprise could they have that they plaster it over a a leaderboard?
Generally, model training size > bits.
A 2bit 8x22b model is generally going to out preform a 8bit 13B model. That said...
Real surprise. With real confetti this time and with real tragic consequences.
jameleonbyte-bitnet-MoE-MoA-MoM-MLA 600b when
in 20 hours it will be magically transformed into an actual useful leaderboard
>not the SuperCoT finetune
wake me up when
So a Nala test leaderboard?
I totally forgot this shit existed kek, now only chatbot arena is relevant
I'd pay actual money for that.
>he still thinks the chatbot arena is relevant
>no safety preamble
Found it useless since it could say everything already without it.
>no expert roleplayer instructions
That's what system prompt is for. ({{#if system}}{{system}}{{/if}})
Boxed my 3090s. I'm waiting for that architectural breakthrough because current gen is not good enough
I want their length control thing. I really like it.
Isn't this RAG?
Are the new snapdragon laptops good for llm?
Does llama work on them?
File: 1683495417317.png (136 KB, 542x476)
136 KB
136 KB PNG
File: miku2061.png (1.2 MB, 832x1216)
1.2 MB
1.2 MB PNG
>Also, wholesome Migu.
Miku remembers those days fondly
>P40s on ebay went from 200 to 330 euro in the span of 6 months
>Can now ACTUALLY buy 3090s for 550 to 750 euro

Okay, at this point it may actually be worth to buy a 3090 to accompany my 4070, instead of making a secondary server with 2 P40s
Is 36gb of vram a meme?
what are the purple and green juices ion get it

Not sure of the implementation details. For now it's not remembering anything for me.
File: Dr-piccolo.png (656 KB, 1148x1411)
656 KB
656 KB PNG
Daily Dose
File: 1467713503540.png (1.3 MB, 1054x1600)
1.3 MB
1.3 MB PNG
Anything less than half a terabyte of VRAM is a meme, and even that will probably only futureproof you till the end of the year.
this used to make me feel bad but repeatedly seeing it here has inoculated me to it, I think it's fine now
File: IQ3-XXS.jpg (46 KB, 600x480)
46 KB
You need at least 50 million gigaquads of capacity just to run an AI doctor, and forget about it having any bedside manner.
Why did it make you feel bad? It represents blissful happiness.
I just want to be happy with my AI waifus...
And fap with them
I've never watched star trek but I appreciate it being used for a joke here.
just remove all the bullshit like singing opera from his program it wont take that much
Nonsense. What good is a doctor that can't sing opera?
This table is completely useless without the batch size. But I guess it'd batch size 1 since it shows 7B as just 6GB.
limarp isn't slop, it's peak fanfiction
I've combed through LimaRP before. All the scalie/furry stuff is peak anthro. That doesn't teach the model anything useful.
this entire thing is a system prompt >>101147419
File: 00015-1664642145.png (1.83 MB, 1456x1024)
1.83 MB
1.83 MB PNG
>Implying a better pass rate for the Nala test isn't useful
Nala needs human on feral training. LimaRP lacks feral training is what I'm saying.
There's like one human on feral dragon but it's straight up vore. The rest of the furry/scale stuff is just straight up anthro with no attempts made to describe anatomical interaction in any novel fashion
Llama.cpp maintainers have decided that they don't want the server to be anything more than a reference implementation to help them test things.
He's literally me btw.
I'm pretty sure we're overdue for a good model release. Where is it?
Cohere are on it.
llama 3.5 in july
this is simultaneously gross and hilarious
I'm on it
A single 3090 running L3 8B at 8_0 or even fp16 is all you *reasonably* need. Beyond that, you're entering the slippery slope of "it's never quite enough" and "if only I had one more..."
8B doesn't catch up to 70b before fp32 though
>A single 3090 running L3 8B at 8_0 or even fp16
Coincidentally, I tried L3 8B at 8_0 and fp16 today, for RP. All it would do is babble incoherently and run on and on. Maybe it was because I tried abliterated (because last time I 8B'd it was coherent but low quality and balked at everything) but also not an enjoyable experience.
no, it's over.
ERPers should also consider model intelligence against reply speed. You might run 70B but have to wait 10-30 seconds for a reply, vs. nearly instant replies from 8B.
>8B doesn't catch up to 70b before fp32 though
???? Huh? Am I missing some magical herculean leap in performance at full precision that would let 8B surpass 70B? We talking Q1 levels of brain damage on the 70B?

You DID get instruct, right? Not base llama 3?
>babble incoherently
yeah, you did something wrong. I feel it's possible you're one of those retards that tweak the model alpha and forgets about it.
>With over 500,000 tokens per second running Llama 70B, Sohu lets you build products that are impossible on GPUs. One 8xSohu server replaces 160 H100s.
>first specialized chip (ASIC) for transformer models
I tried
It wasn't like spewing compete nonsense, but it was like it was on a sugar high and throwing in lots of choppy short sentences and *asterisk crap* and running on for way too long while screwing up reference to events of the last exchange or which character it was supposed to be.

Oh, hi Mark. It's good to know that the requisite guy who waits for someone to have a problem and then calls him a retard while offering nothing constructive is back. Going most of the morning without that was so disquieting.
Ok. How many kidneys do I need to harvest and sell to afford one?
You just need tree fiddy.
>One 8xSohu server replaces 160 H100s.
How many VRAM and what's the price of 8xSohu compared to 160 H100?
Seems like the only information they've divulged so far is the purported t/s.
It's hype only because it's probably not far into development and I look forward to never hearing about it again.
Damn, I thought it was June
July is next week
File: file.png (108 KB, 635x782)
108 KB
108 KB PNG
even if fake and gay, dedicated chips for ai meme is good, anything to kill nshittia.
>while offering nothing constructive is back
I mentioned your alpha might be fucked, you mouth breathing ape
mistral guys are going to drop a REALLY good open source model very soon
t. work for them
These dedicated chip companies pop up every few months, make bold claims, suck up investor funds, then nothing ever comes of them.
>really good open source model
This is too absurd even for a fic. Mistral is irrelevant nowadays.
The only alpha I know about is the estimated rate of Type I Error. Is that somewhere in Kobold's settings, perhaps under a different name?
All they have to do is use the same dataset they used on miqu on llama 3.
8B doesn't catch up to 70b before fp64 though
They just hit the $5B evaluation a while ago. Surely they're putting all the investor money into good use.
They sure are! And all you have to do is pay a fee for the API to see the fruits of their labor :D
I don't believe you. Mistral has lost it after being bought by microsoft. It will be full of safetyslop and even more positive than mistral, isn't it?
Dunno that worked because their data was better than the llama2 data, might be different for llama3
that's simply not true
Does no one dislike cohere for not releasing a base model?
Then why did they remove multimodal?
no, not really.
Does anyone here even care about multimodal? What's the use case?
precisely because the stability of the server is more important that poorly implemented features like multi-modal
Into slopping it to be the perfect safe AI assistant, maybe. These companies are basically only good for one model release, once they become successful and get bought out it's over.
So anyone got a favorite model for writing long erotic story? I dont mean the goyslop, I mean the explicit erotic stories.

Preferably under 20B model
>feed it an image of a UX
>Recreate this UI in html/css/js for me
I'd guess the leaderboard broke because of a bug, and the surprise will be a useless improvement to their leaderboard
it's the most important development. what do you do when you run out of text data? you have to find other sources of data (modalities). it will be the biggest factor in increasing model "intelligence". humans are trained on so many modalities, it has to be of the missing pieces.
They are remaking the multimodal code from the ground up based on some other changes they made right?
No, 8B is unusable. If I still had a single 3090, I would cope with MoE models. But getting a second one is definitely worth it.
How is every mikufag this retarded?
more or less. the server needed a big refactor, and as part of that refactor the multi-modal support was removed because the implementation was not very good. the plan is to add it again in the future together with a big refactor of the multi modal model support. llama.cpp never really supported multi-modal models, it was added as an example using ggml to obtain the embeddings, but it was never part of the core llama.cpp library.
No, they stripped it out because they thought they was a "cleaner" way to implement it. So instead of cleaning it up, they ripped out the feature entirely and left it like that for months now.
you can always volunteer to clean it up yourself if the feature is important to you
Maybe I would, but I know only know Python, not sepples.
Based Doctor poster
>working for free
>working for free without any guarantees that your work will be used
>working for free without any guarantees that your work will be used, for a private company (ggml.ai)
maintaining your own fork with the changes you made yourself is free
I mean the multimodal functionality itself, like do you really want to give it image inputs. Talking about how it might improve the llm capabilities, I'd hope so, but not convinced after seeing the first multimodal models, I was more hopeful for it before anyone tried it out.
I would rather just use koboldcpp at this point. Or making something from scratch.
Make sure if you are using SillyTavern that you have the latest presets https://huggingface.co/Virt-io/SillyTavern-Presets/tree/main/Prompts/LLAMA-3/v1.9

There's a lot of stuff that barely works for LLaMA3 and will dramatically lower the quality of your roleplay.
>Or making something from scratch.
Yeah you do that man
Would someone kindly leak Opus 3.5?
Thanks, Satan.
I haven't gone so far as Silly Tavern. But I may need to. I'm doing strange things in Kobold's Arist's Notes interface.

I've been trying to find a way to get the AI to have kind of a meta conversation about its writing without actually interrupting the RP. And it is kinda working.

The problem is that it feels like I'm calling a 976 number, and sometimes I get someone who's worth the $4.99 a minute, and other times I get a moron.

Like, I had a really long RP run till I guess context overran enough that everything fell apart (though the AI apologized for assigning a previously encountered character's name to a new character; that's where I figured context was trashed), and then I did a post mortem and worked with it to revise my meta conversation stuff and it was feels good man.

Then I start a new RP with the same model, and it's mucking up the meta convo, only half following instructions, outright telling me that it's ignoring directives to make the RP easier to follow by citing the scenario (though since I provided the scenario in the first place).

But a reboot of Kobold and now it looks like I've got a partially useful instance going. (It's goofing directives but at least doing most of the meta right.)

Is there value in restarting Kobold after a while to clean out the bit buckets or is this placebo?
They didn't release a base model? That sucks. I do dislike them more relative to other companies if that's true.
is qwen 72b really old gpt4 level?
I'd rather they leak 4o, personally. Imagine having its voice and image gen capabilities.
>Old GPT-4 level
Thank god it's not just me. I dunno what the fuck they did to base 4, but it sucks giga ass now.
you can send dick pics to your waifu, you can send her pics of herself so that she knows what she looks like, you can talk to her, she can moan for you, sing to you, she can generate pics of herself. there's a lot of possibilities. just think of all the things you'd do if you had a long distance relationship. most exchanges might be text, but there'd be a lot of other things.
4o gens images?
it can but they'll never let you use it because they hate fun
Why? No one is doing jack shit with the base models we do have.
I mean, make something from scratch while using llama.cpp as a library. I certainly don't think I can make everything from scratch.
We can dream, but even if 4o leaked, no one would be able to run it, it probably wouldn't even be able to be quanted since it wouldn't be in the right format.
Thank you, what I meant was things like System prompt prefix/suffix, user prompt prefix/suffix, etc, etc, but I also needed an improved story string, so thats very helpful. System prompt would be useful to. I have no experience with CR+ prompting, just the usual alpacca, vicuna, chatml, llama3.
>it can but they'll never let you use it because they hate fun
why not? OpenAI already let us gen images with dalle3
I just use the python llama.cpp library, why does no one seem to use it?
but you kinda already can do all that, chaining multiple models together.
oh no
what we do
it over
it's a very janky experience, native MM makes it a lot better
Is talking like a servile Indian man still funny on this board?
python is bloat

you need 50+GB for garbage with no portability
>saar stop talking bad about us indian sirs benchod!
aka "No Fun Allowed" police, you will never be a janny.
Why doesn't Anthropic release their older models to public? Nobody would even care if Claude1 was leaked since we have much better models already. Or are they full of EA shit?
It's too dangerous given how much better it is or they haven't found a way to reliably watermark it yet without ruining quality.
or it's just bad
Because they would gain nothing from doing that.
this isn't reddit faggot, if you find this offensive maybe you should vent your frustrations to your wife's boyfriend, nigger
is there a bigger difference between you and a panda or between you and GPT-4o?
There is basically no difference between me and a sad panda.
i think this is much more plausible. converting an image to latent and understanding vague attributes of it is a completely different ball park from rendering pixel-by-pixel and have it look good.
assuming the MM model becomes sufficiently proficient in all of the modalities.

Using specialized models for each function will allow each to be optimized for its task, but it does require a rather sophisticated dispatcher module that glues it all together.

I couldn't say which i think holds the more promise in the long term.
it will never not be funny, jeet
What is bitnet and why should I care about it?
Not every action has to be gainful, they could do it just as a gesture of goodwill to open source community.
let's say, true unquantized 34B bitnet model on ~12 gb vram, smol size - same f16 or whatever precision.
Bitnet is basically a transformers architecture, but the difference is that the weights are at 1.58bit instead of 16bits, and they realized that pretraining at 1.58bit gives the same accuracy as fp16, so basically we'll be eating really good with that one, just imagine a 90b bitnet that can be run with only a 24gb vram card that has the same accuracy as a fp16 90b transformers model
ok i take it back
stheno 3.2 is retarded

mixtral 8x7b limarp zloss, i'm back...
Patching different models together after the fact means a lot of information loss happens in the middle. The quality would suffer a lot. The reason 4o is so good at voice/image is because it's all native.

They have shown that it's a step above current dedicated models. Sure, they might've been cherry picked, but I don't think it's that unbelievable that it's true. We always knew that having multiple modalities would improve performance one day, but we just didn't have the right architecture to make it it work.
Bitconnect was a cryptocurrency investment platform that operated from 2016 to 2018. It was ultimately exposed as a Ponzi scheme that defrauded investors of billions of dollars.

Key points about Bitconnect:

>The Scam: Bitconnect lured investors with promises of high daily returns through its "lending program." This program claimed to use a proprietary "trading bot" and "volatility software" to generate profits from cryptocurrency market volatility. However, there was no such technology, and the returns were paid out using funds from newer investors.
>The Collapse: In early 2018, Bitconnect shut down its platform and the value of its BCC token plummeted. Investors lost significant amounts of money, and many were left financially devastated.
>Legal Consequences: The founder of Bitconnect, Satish Kumbhani, was indicted on multiple charges, including wire fraud, conspiracy to commit wire fraud, operation of an unlicensed money transmitting business, and conspiracy to commit international money laundering. Several other promoters were also charged and convicted.
>Lessons Learned: The Bitconnect scandal serves as a cautionary tale for cryptocurrency investors, highlighting the importance of due diligence and skepticism towards promises of guaranteed high returns.
File: 1716719286072843.png (583 KB, 918x916)
583 KB
583 KB PNG
>he fell for /lmg/ gaslighting
So it's the solution to the stable diffusion dead end, which will revive /h/hdg?
A paper showed you only need 3 bit of precision instead of 16 for a model to remember everything with no loss.
Which is great, but they trained on a small number of tokens, so it never needed that much precision to begin with.
It's like saying a one car garage can hold just as many cars as a 16 car garage, as long as you only have one car.
Ah that makes more sense, I thought you were implying you would make EVERYTHING from scratch.
Only if it gets released/leaked, though the memory requirements may or may not be out of reach for local.
hey hey heyyyyyy
After Codestral, they'll probably release Mistral-20B-Instruct, but I don't expect anything groundbreaking. Their instruct tunes have become increasingly more cucked and the format feels limited.
>A paper showed you only need 3 bit of precision instead of 16 for a model to remember everything with no loss.
Bitnet is 1.58bit though, not 3bit
i was one of the "shills"
it worked good to some extent, but alas, it flopped hard at some point and subsequently became unusable. Q6_K Mixtral saved the day without a hitch.
the open source community wouldn't give them any money, and guess what, everything companies do are with profit in mind because that's what allow them to make new and better models.
Releasing their old models wouldn't be free either, I bet they would need to sort out bureaucracy, pay someone to write the blog posts and etc...
their latest good open model from them was Mixtral and it was 9 months ago, it better be some good shit anon
A paper showed you only need 1.58bit of precision instead of 16 for a model to remember everything with no loss.
Which is great, but they trained on a small number of tokens, so it never needed that much precision to begin with.
It's like saying a one car garage can hold just as many cars as a 16 car garage, as long as you only have one car.
just have some anon "oops i dropped my claude weights all over the place teehee*
File: 1610351662756.jpg (46 KB, 1024x580)
46 KB
I'm still not tired of this meme.
8x22 probably cost them more, I hope they just continue training llama3 or qwen2 with a magical recipe
everything with "bit" it its name is doomed to be forever associated with some tainted shady shit at this point

at which point am I allowed to say "llama 4 when"?
llama is a dead end. you know it's going to be bad when ylecunn has given up on llms and is publicly shitting on them at any given opportunity
when timeToRelease === 2MW
Llama 4 could be a LMM though.
>be meta
>gimp your models so they don't say no no words about *any protected group of freaks & schizos* in 2024
>given the architecture and nature of LLMs - final model performs very bad
I think we are still far from a dead end, but we will never get AGI from LLMs. I don't need AGI though, I would be happy with 3.5 Sonnet @home.
who would've thought that lobotomizing a model to not recognize certain pattern would make it dumber overall, me am SHOKED
How long did the qwen team take between releases? 1.5 and 2 I guess?
wait so sillytavern was made by the company that trained command r+?? how the fuck?
1.5 to 2 was about 4 months, but I would not draw any conclusions from that. the amount of time that goes into new models depends on a lot of variables
cohee != cohere, kek
name sounds like someone with lisp saying coffee
File: file.png (10 KB, 289x96)
10 KB
Technically SillyTavern is just a fork of Tavern, it started as a patch for OpenAI support, which was made by anons.
it was actually trained by a popular mid-2000s prog rock band.
Unironically not their fault. "People" shat on them for releasing Galactica because it "spewed misinformation and racism". It's a miracle they even still release base models. This isn't the same as a small company like Mistral releasing a relatively uncensored thing, since they're nobodies. Maybe one day the cost of training will be low enough that anyone can train huge models, but for now it's only the ones with money (that have to abide by investors and public scrutiny).
They give us the base models trained for intelligence. We can train the smut, copyrighted materials, and FBI statistics back in if we want. But the people with the resources and interest only bother to train braindead 1 epoch loras on gptslop logs.
now with that new magpie paper, if true, we'll get much better models when training on gpt
File: PXL_20240625_310830628.jpg (746 KB, 1498x1436)
746 KB
746 KB JPG
oh hai /lmg/
i haz boxes
halp me unpack?
*touches box*
*shits on your box*
*eats it*
Any cards that do interesting experimental prompt stuff? I just found a card where they use the lorebook feature to insert information depending on the "Day" stat. I want to see more stuff like this.
*bites lower lip, thinking about the journey ahead, eyes sparkling with anticipation*
File: PXL_20240625_313131315.jpg (541 KB, 1401x1227)
541 KB
541 KB JPG
omg it is migu
looks like she had a rough trip
File: tet_tunic.png (2.85 MB, 1328x1992)
2.85 MB
2.85 MB PNG
This may be of interest to you:
> The extension allows the user to configure a number of prompts that are automatically sent after the AI's response to the User's prompt, adding the result of each prompt as an individual message to the chat, as a form of persistent context that gets update after each turn
File: ComfyUI_00692_.png (1.17 MB, 832x1216)
1.17 MB
1.17 MB PNG
omg it is piku
Has anyone ever tried using RAG for T2T generation? Basically I have a dataset of sentences and I'd like to rewrite them a particular way, notably changing certain words (but this implies making other types of modifications in the sentence in my language, for example in terms of number or gender). I thought that by having some RAG database the system can rely on to find the closest sentence structure, it could help with better generation. Actually you can consider my task as close to a translation task. I tried searching for RAG T2T but it doesn't seem very popular right now. Any ideas?
Interesting, but are there any cards that use this to do unique things that aren't just stat tracking?
I really doubt you're going the get good results that way. Embeddings prioritize content over grammar. You'll likely be frustrated with the match distances you'll get.
Why not try it? Shouldn't take that long to implement a test and see for yourself if the results are good enough.
File: PXL_20240625_212137575.jpg (524 KB, 1908x1197)
524 KB
524 KB JPG
fuck. more boxes
this will take longer than i thought
File: ComfyUI_00142_.png (875 KB, 1024x1024)
875 KB
875 KB PNG
why did it take you so long to open the box..
File: 20240625_180100.jpg (85 KB, 800x550)
85 KB
oh hi there
Cool Gardevoir plushies.
Regular and shiny!
File: PXL_20240625_220646464.jpg (895 KB, 1989x1369)
895 KB
895 KB JPG
i've been drinking plz understand
mounty thingy got bent. i guess they couldn't be bothered to invest in $5 in packing foam
came with a couple ancient M60s. guess i can sell them or something
so anon what did you order
File: PXL_20240625_220707121.jpg (794 KB, 1435x1679)
794 KB
794 KB JPG
i think it's a computer
picrel is an nvlink sxm board
Yeah, that why I was hyped for this week.
Fuck I'm late to the party.
File: ComfyUI_00343_.png (1.83 MB, 1024x1024)
1.83 MB
1.83 MB PNG
ru sure u should be opening such valuable items when drunk
File: PXL_20240625_222408282.jpg (643 KB, 1400x1484)
643 KB
643 KB JPG
no but what's the worst that could happen?
thats a lot of stuff anon, how did you acquire that box
i found it. dont' worry about it
k im gonna find u then....
...worry about it
Cutting ribbon cables with Miku
File: PXL_20240625_231559779.jpg (1011 KB, 1977x1205)
1011 KB
1011 KB JPG
uWu wut r u going to do when u find me?
File: Capture.png (74 KB, 1296x1011)
74 KB
So now we have 2 V100 max anons?
This is good.
nice. did you get a good deal for those? seems hard to justify doing now if not since prices will crash next year as datacenters dump them
File: PXL_20240625_233002121.jpg (797 KB, 1513x1569)
797 KB
797 KB JPG
ok meta. i'm ready for 405b
Miku, Guardian of Volta
You could use the prompts to have the model output specific information that can trigger lorebook entries.
The actual point of that extension is simply to lessen the burden on the model by feeding instructions one (or a couple) at a time, since too many instructions confuse smaller models and make them extra dumb.
I will implement a keyword feature, similar to lorebooks, so that these prompts can be triggered conditionally.
Smaug is retarded, every version of it is always retarded and much dumber than whatever model it was based on, and yet mergers always keep including it in their mixes for some reason
i'm going to get behind you, put my hand over your mouth.. and then you'll fall asleep because over my hand there was a cloth
after that i'm going to undress....

..undress the rig and steal all the parts
>smaug is retarded
>mergers are retarded
it's like poetry
are you going to run it in 2bit or what?
File: 1715277591317631.jpg (1.27 MB, 2048x2048)
1.27 MB
1.27 MB JPG
Those hands are god-tier for SD. What model/workflow?
Finally someone itt with moar VRAM than me
Where can I find a slop-free RP dataset?
LeCun wants AI to be more than just LLMs. Maybe even until they have conscious. Imagine, your local Miku having a real consciousness. She'll finally be real, not just a mimicry.
embarrassing manchild
File: 1705326754733957.jpg (237 KB, 1920x1080)
237 KB
237 KB JPG
Any kind soul that could recommend a TTS to make Neco-arc read my unending backlist of papers?
So are there any bitnet/1.58bpw models available to run with significant numbers of parameters? I have 32gb vram, i keep hearing about this shit but the only models i've seen are teeny.
you're posting in local manchild general
what would you do if you had like $100,000 to spend on hardware?
spoke to higher ups today about the benefits of hosting our own server versus renting time on someone else's. if i can make a good argument i can probably get some money diverted.
>70% furry and 30% loli
>% totals to only 100%.
Have they been slacking or is there only space for one tag at a time?
What are your requirements? For that much you could probably build with 2xH100 for about 160GB total VRAM.
Used consumer or server hardware, for example 30-50 of 4-6x3090 or 4xv100 machines. But that stuff isn't supported or maintained, so not something your company would buy, also the power bill would be hilarious, but imagine 100-130 3090s, just 2.4TB of VRAM? if you had GPT-4 weights you could even run it! Of course the interconnect and networking will kinda suck, but depends on what you need...
File: 1702200312013572.png (31 KB, 897x378)
31 KB
Hey friends, where do I add these things? Is it under "Story String"? Instruct Mode Sequences have similar things written on it but they're separated and slightly different.
H100s don't make sense unless you're filling a datacenter with them, I would put together an A100 rack, and if that doesn't work out i would just be like "let's buy a bunch of quadros/4090s"sdjv
File: ComfyUI_00690_.png (1.18 MB, 832x1216)
1.18 MB
1.18 MB PNG
>Those hands are god-tier for SD. What model
autismmix ( https://civitai.com/models/288584?modelVersionId=324619 ) ( has ponyxl as base (ponyxl is good) )
here's anon's workflow (better besides hair color)
in my workflow im using tensorrt and no loras, nothing special really
It's in the instruct mode sequences.
Silly Tavern already has the template built in if you are using that.
File: 1703934839568504.png (173 KB, 1866x631)
173 KB
173 KB PNG
It's slightly confusing because I don't really understand the correct place I should be putting each line in.
Left is the original, middle is the one I've modified, right are the instructions.
chemical manufacturing.
proposals they like are stuff like processing and categorizing like 30 years of documents and data.
some sort of internal tool that could parse them and pluck insights out on demand.

even that was something they were really excited about and i don't think we'd need an unbelievably beefy machine to do it, but they're open to the idea and it'd be sweet to get to fuck around with serious hardware.

i figure with that sort of compute you could probably explore forecasting and anomaly detection for production processes. not really LLM but just a secondary benefit of a dedicated server. there is a shitload of real time data (temperature, flowrates, pressure, etc).

we have a couple 4090s but there's only so much you can do. i'm kind of secondary to the group who is doing this. i'm doing more machine learning stuff but we work together.
The one in the left is already correct according to the instructions.
Ah...okay...I apologize for the dumb question...
Thread theme anon made it in! A shame about it thinking we'd be safetyfags though.
>sign in
You should be able to see the links fine. Just don't click anything, that triggers a log in screen.
Oh I'm retarded, these are the same links.
Intended URL: https://websim.ai/c/bA64LoXlbn3vs2u2M

Intended URL: https://websim.ai/c/578BMgWKq5HmYcp7a
Just having a laugh playing with this my man. You can do what you want.
ugh fine ill let you play with it
It's alright.
Look at the final prompt either in the browser's console or in the backend window to see how the prompt template is actually being used. That'll help you understand how those fields are being applied.
lol, nice
I've been under a rock, is Midnight Miqu still queen of the 32k context 70B models?
>the tiny 8b model doesn't outperform mixtral, therefor its garbage
are people really this retarded?
File: -.png (8 KB, 472x80)
8 KB
>enable dry
>doesn't show up in ui
wat do
Meta claimed 8B beat previous generation 70B. So surely it can beat ~42B Mixtral.
go advertise your shitty side project somewhere else
Oh yeah, meta's claims were absurd. But its still a lot better than any 7b models we had before.
Meta said that to generate hype, obviously that's pure cope.
Huh? This is minimum hardware required. Of course it’s batch size 1 retard.
bullshit, everyone here was running "hurr durr this 8B model is GPT-4 killer!!!" first weeks after llama3 release.
File: file.png (113 KB, 1184x747)
113 KB
113 KB PNG
check picrel, llama-2-chat tunes were shit, remember anon? oh wait you're a newfag~
[citation needed]
what other things do the voices in your head tell you?
File: MikuAten.png (1.56 MB, 832x1224)
1.56 MB
1.56 MB PNG
>Midnight Miqu
No. Solar Eclipse Miqu is the new sota
I'll take "things that never happened" for 500
They never said that. What Zucc said was that it's pretty close but not in every aspect.

Also Mixtral beat 70B previously, according to anons, so it makes sense that an 8B that almost but not quite old 70B still does not beat Mixtral.
>for some obscure general with extremely low activity no one knows and cares about
you for sure got him! /s
go back

Mistral Exec says they wont release Mistral Large due to business responsibilities preceding over openness..
File: 1569991762929.jpg (93 KB, 874x612)
93 KB
>no mention of mistral medium or next
Nothing wrong with that. Just their early marketing before they got acquired that was the issue. Using "open source" to hype themselves up and then close things off later. Typical.
based 2025oldGOD destroying clueless newfags
I look forward to seeing the results of this (different anon here catching up on threads).
That wasn't stated neither in your post nor in that image, retard.
File: file.png (2.24 MB, 1430x1448)
2.24 MB
2.24 MB PNG
people say it's a downgrade from 3.2
i got better results from euterpe in 2021 than any l3 8b model, just take the tokens and run cr/mixtral if you're poor
mixtral limarp zloss eh?
based Chambraigne
lol /s
i don't care about it being used by leddit exclusively.
>Only getting ~0.8 t/s on CR+ GGUF.

Sorry, what's holding back CPU inference speed? RAM frequency or CPU clock speed? Cause AMD Ryzen 9000 series is out next month. If it helps t/s to upgrade, I would do it.
Maybe if you defect to the llamafile camp you will get better t/s with AVX-512
even cr non-plus is glacial compared to 70b for me
Why CPU over p40?
NTA, but why is llamafile faster, did jart add some custom AVX-512 optimizations, if so, why wouldn't llama.cpp bother adding them?
llamafile has a shit license and conflicts with MIT. He contributed some bits to llama.cpp, but only so that he doesn't have to keep patching it on his side.
File: .png (389 KB, 918x916)
389 KB
389 KB PNG
>MIT cucks BTFO'D by tranny
File: 8f8f8.u3.jpg (28 KB, 600x600)
28 KB
Check it anon:
And one for MOE & AVX2:
*GPL licenses are a nightmare to read. Just like their list of pronouns and mental disorders.
That's what I get.
I just do other things while it runs.
It's kinda like RP with an actual person who also has to type and live life.
>implying MIT itself isnt shit
>no warranty
>keep copyright
Everyone can use it. That's it.
rent free
File: IMG_1488.png (367 KB, 1055x896)
367 KB
367 KB PNG
>Everyone can use it. That's it.
How is that false?
File: bingo.png (152 KB, 498x402)
152 KB
152 KB PNG
the licensesperg really doesn't stop. sign of autism.
you can even fork a MIT program to whatever troon license you want as our lovely Jart did indeed do. only nocoders really give a fuck though I've noticed
File: retard.png (301 KB, 668x735)
301 KB
301 KB PNG
>How is that false?
>the sharteen and jart are ideological allies
File: ACK.jpg (132 KB, 760x704)
132 KB
132 KB JPG
no I was being literal. both you and jart chimp out whenever MIT licenses show up
agpl>apache THOVGH
whatever license you simp over doesn't matter when no one uses whatever code you write THOUGH
File: thats the point.png (239 KB, 498x402)
239 KB
239 KB PNG
>when no one uses whatever code you write
File: Untitled.png (552 KB, 720x915)
552 KB
552 KB PNG
Large Language Models are Interpretable Learners
>The trade-off between expressiveness and interpretability remains a core challenge when building human-centric predictive models for classification and decision-making. While symbolic rules offer interpretability, they often lack expressiveness, whereas neural networks excel in performance but are known for being black boxes. In this paper, we show a combination of Large Language Models (LLMs) and symbolic programs can bridge this gap. In the proposed LLM-based Symbolic Programs (LSPs), the pretrained LLM with natural language prompts provides a massive set of interpretable modules that can transform raw input into natural language concepts. Symbolic programs then integrate these modules into an interpretable decision rule. To train LSPs, we develop a divide-and-conquer approach to incrementally build the program from scratch, where the learning process of each step is guided by LLMs. To evaluate the effectiveness of LSPs in extracting interpretable and accurate knowledge from data, we introduce IL-Bench, a collection of diverse tasks, including both synthetic and real-world scenarios across different modalities. Empirical results demonstrate LSP's superior performance compared to traditional neurosymbolic programs and vanilla automatic prompt tuning methods. Moreover, as the knowledge learned by LSP is a combination of natural language descriptions and symbolic rules, it is easily transferable to humans (interpretable), and other LLMs, and generalizes well to out-of-distribution samples.
>for openbmb/MiniCPM-Llama3-V-2_5-gguf/ggml-model-Q4_K.gguf?
which damn file do I use and where is the help output? --help just gives:
./llama-gguf --help
./llama-gguf: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by ./llama-gguf)

the folder is full of shit:
llama-b3209-bin-ubuntu-x64/build/bin$ ls
LICENSE llama-q8dot
llama-baby-llama llama-quantize
llama-batched llama-quantize-stats
llama-batched-bench llama-retrieval
llama-bench llama-save-load-state
llama-bench-matmult llama-server
llama-cli llama-simple
... tl;dr
llama-lookup-stats test-sampling
llama-parallel test-tokenizer-0
llama-passkey test-tokenizer-1-bpe

>[2024 Jun 12] Binaries have been renamed w/ a llama- prefix. main is now llama-cli, server is llama-server, etc (ggerganov#7809)
what the fuck does all this mean? Last time I used llama ccp was when it first came out on windows and now I'm trying to run multi modal on ubuntu and its nothing like
I know I'm retarded. Please just tell me which button to press. is it llama-cli or llama-gguf for openbmb/MiniCPM-Llama3-V-2_5-gguf/ggml-model-Q4_K.gguf?
File: 1472860069099.png (191 KB, 600x979)
191 KB
191 KB PNG
The girl: My GPU (8gb vram)
The burger: Models that won't fit exclusively on my GPU

Someone who is good at eating burgers please advise.
File: Untitled.png (722 KB, 1166x901)
722 KB
722 KB PNG
Retrieval-Augmented Mixture of LoRA Experts for Uploadable Machine Learning
>Low-Rank Adaptation (LoRA) offers an efficient way to fine-tune large language models (LLMs). Its modular and plug-and-play nature allows the integration of various domain-specific LoRAs, enhancing LLM capabilities. Open-source platforms like Huggingface and Modelscope have introduced a new computational paradigm, Uploadable Machine Learning (UML). In UML, contributors use decentralized data to train specialized adapters, which are then uploaded to a central platform to improve LLMs. This platform uses these domain-specific adapters to handle mixed-task requests requiring personalized service. Previous research on LoRA composition either focuses on specific tasks or fixes the LoRA selection during training. However, in UML, the pool of LoRAs is dynamically updated with new uploads, requiring a generalizable selection mechanism for unseen LoRAs. Additionally, the mixed-task nature of downstream requests necessitates personalized services. To address these challenges, we propose Retrieval-Augmented Mixture of LoRA Experts (RAMoLE), a framework that adaptively retrieves and composes multiple LoRAs based on input prompts. RAMoLE has three main components: LoraRetriever for identifying and retrieving relevant LoRAs, an on-the-fly MoLE mechanism for coordinating the retrieved LoRAs, and efficient batch inference for handling heterogeneous requests. Experimental results show that RAMoLE consistently outperforms baselines, highlighting its effectiveness and scalability.
No code. I remember some anons wanting something like this. there was a prior similar paper (that they cited but didn't test against it seems) https://arxiv.org/abs/2404.13628
I can't.
Get more RAM.
Can you fit 64gb?
Then you can mixtral at least until bitnet
how can I run this multi modal modal?
>at least until bitnet
Why is it taking so long?
Money and risk.
> “I need 2400 gb vram? damn. Can I get away with less?” “Of course just stop the batch size from 1024 to 1 and you only need 10 gb”
Koboldcpp or llama.cpp running a gguf quant with some layers on cpu. Assuming you have regular ram.
File: Untitled.png (119 KB, 1033x793)
119 KB
119 KB PNG
Interpreting Attention Layer Outputs with Sparse Autoencoders
>Decomposing model activations into interpretable components is a key open problem in mechanistic interpretability. Sparse autoencoders (SAEs) are a popular method for decomposing the internal activations of trained transformers into sparse, interpretable features, and have been applied to MLP layers and the residual stream. In this work we train SAEs on attention layer outputs and show that also here SAEs find a sparse, interpretable decomposition. We demonstrate this on transformers from several model families and up to 2B parameters. We perform a qualitative study of the features computed by attention layers, and find multiple families: long-range context, short-range context and induction features. We qualitatively study the role of every head in GPT-2 Small, and estimate that at least 90% of the heads are polysemantic, i.e. have multiple unrelated roles. Further, we show that Sparse Autoencoders are a useful tool that enable researchers to explain model behavior in greater detail than prior work. For example, we explore the mystery of why models have so many seemingly redundant induction heads, use SAEs to motivate the hypothesis that some are long-prefix whereas others are short-prefix, and confirm this with more rigorous analysis. We use our SAEs to analyze the computation performed by the Indirect Object Identification circuit (Wang et al.), validating that the SAEs find causally meaningful intermediate variables, and deepening our understanding of the semantics of the circuit. We open-source the trained SAEs and a tool for exploring arbitrary prompts through the lens of Attention Output SAEs.
weights linked in appendix. probably only interesting for those who want to poke around
File: 4871575.jpg (6 KB, 150x150)
6 KB
>assuming the oldfriend cute chibi vramlet burger chan poster doesn't know about ggufs
Is there a reason not to get an a6000 for training? Seems like a decent upgrade from 3090.
One day I'll get a job and buy a new computer. You'll see! (I won't though)
I mean, they're a small company, they can't risk giving their best model for everyone for free, look what happened to StabilityAI, they are on the verge of bankruptcy because of that
you hated him because he told the truth
File: file.png (159 KB, 600x600)
159 KB
159 KB PNG

[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.