/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Janitor applications are now being accepted. Apply here.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 11/15/24(Fri)14:13:08 No.103196822

File: 1701617568980511.jpg (1.06 MB, 2925x4096)

1.06 MB JPG

/lmg/ - Local Models General Anonymous 11/15/24(Fri)14:13:08 No.103196822

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103189328 & >>103173457

►News
>(11/12) Qwen2.5-Coder series released https://qwenlm.github.io/blog/qwen2.5-coder-family/
>(11/08) Sarashina2-8x70B, a Japan-trained LLM model: https://hf.co/sbintuitions/sarashina2-8x70b
>(11/05) Hunyuan-Large released with 389B and 52B active: https://hf.co/tencent/Tencent-Hunyuan-Large
>(10/31) QTIP: Quantization with Trellises and Incoherence Processing: https://github.com/Cornell-RelaxML/qtip
>(10/31) Fish Agent V0.1 3B: Voice-to-Voice and TTS model: https://hf.co/fishaudio/fish-agent-v0.1-3b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
11/15/24(Fri)14:16:50 No.103196853

Anonymous 11/15/24(Fri)14:16:50 No.103196853

petra anchor

Anonymous
11/15/24(Fri)14:18:14 No.103196863

Anonymous 11/15/24(Fri)14:18:14 No.103196863

dead general ahead

Anonymous
11/15/24(Fri)14:18:27 No.103196865

Anonymous 11/15/24(Fri)14:18:27 No.103196865

*shits in your thread*

Anonymous
11/15/24(Fri)14:19:52 No.103196878

Anonymous 11/15/24(Fri)14:19:52 No.103196878

>>103196822
Kill the yellow one, this is sexual assault.

Anonymous
11/15/24(Fri)14:21:08 No.103196890

Anonymous 11/15/24(Fri)14:21:08 No.103196890

Sexually assault the yellow one

Anonymous
11/15/24(Fri)14:21:11 No.103196891

Anonymous 11/15/24(Fri)14:21:11 No.103196891

File: file.png (277 KB, 692x868)

277 KB PNG

>>103196692
>bitnet meme
placebo psyop

>>103196719
there are, but obviously we can't see them without foresight. they'll just drop one day and that's it

Anonymous
11/15/24(Fri)14:24:59 No.103196928

Anonymous 11/15/24(Fri)14:24:59 No.103196928

File: 1696366778002505.png (304 KB, 500x563)

304 KB PNG

>yellow one

Anonymous
11/15/24(Fri)14:26:17 No.103196939

Anonymous 11/15/24(Fri)14:26:17 No.103196939

>>103196865
Looks like we have visitors from india.

Anonymous
11/15/24(Fri)14:26:23 No.103196942

Anonymous 11/15/24(Fri)14:26:23 No.103196942

>>103196890
>Sexually assault the yellow one

Anonymous
11/15/24(Fri)14:27:05 No.103196947

Anonymous 11/15/24(Fri)14:27:05 No.103196947

>>103196891
Honestly, my big problem with your post
>>103196375
Was this part
>imo it's retarded fitting niche knowledge inside models
because I'm slightly scared of the "rag is all you need" thing becoming a mainstream idea, and meta or mixtral then further filtering models while still giving us bog standard transformers, that's why I seethe so much when people mention rag for niche stuff, since I know stuff usually only gets shittier so yeah

Anonymous
11/15/24(Fri)14:27:29 No.103196952

Anonymous 11/15/24(Fri)14:27:29 No.103196952

>>103196939
It makes sense because your general IS india.

Anonymous
11/15/24(Fri)14:28:17 No.103196959

Anonymous 11/15/24(Fri)14:28:17 No.103196959

>>103196891
>there are, but obviously we can't see them without foresight. they'll just drop one day and that's it
By definition, there literally are not. A model will not magically know how to do some niche RP just by having some RP in context unless you increase processing time. It will do badly. If you use a new special paradigm where you are letting the model think, giving the model entire books worth of RP in context, and/or doing training at test time, all of those literally mean that you are trading off compute, which means your "prompt processing" time skyrockets. People will not use this shit for RP.

Anonymous
11/15/24(Fri)14:29:10 No.103196970

Anonymous 11/15/24(Fri)14:29:10 No.103196970

>>103196939
serbia*

Anonymous
11/15/24(Fri)14:30:24 No.103196980

Anonymous 11/15/24(Fri)14:30:24 No.103196980

>>103196947
>because I'm slightly scared of the "rag is all you need" thing becoming a mainstream idea
I hate that shit too, being able to do RAG on niche knowledge doesn't even come close to having that stuff actually baked into the model when you want to do anything beyond simple info retrieval.

Anonymous
11/15/24(Fri)14:30:33 No.103196981

Anonymous 11/15/24(Fri)14:30:33 No.103196981

>>103196891
>we can't see them without foresight
I don't see how the presence of skin on genitals would positively affect one's visionary capabilities.

Anonymous
11/15/24(Fri)14:31:40 No.103196992

Anonymous 11/15/24(Fri)14:31:40 No.103196992

>>103196947
again, you are thinking about the current "pick topk matches from a vector db" which, as i've said, is retarded and dogshit way of handling things. there we'll be better ways of doing this (maybe even querying smaller llms trained only on the niche stuff?), but again, we can't know them beforehand so we'll have to wait and see

>>103196959
again, re-read the reply chain, you aren't following me

Anonymous
11/15/24(Fri)14:31:58 No.103196996

Anonymous 11/15/24(Fri)14:31:58 No.103196996

File: nala.png (260 KB, 731x851)

260 KB PNG

Does this pass?

Anonymous
11/15/24(Fri)14:33:05 No.103197008

Anonymous 11/15/24(Fri)14:33:05 No.103197008

>>103196996
>a handful instead of a pawful
nah, slop

Anonymous
11/15/24(Fri)14:33:45 No.103197014

Anonymous 11/15/24(Fri)14:33:45 No.103197014

>>103196992
>there we'll be better ways of doing this (maybe even querying smaller llms trained only on the niche stuff?),
still for the 100th time, if the model fuck up on something it should know due to whatever flavor of rag, it can't pick itself up before the next message when stuff gets inserted, it would need to query rag at every token for your idea to be remotely viable

Anonymous
11/15/24(Fri)14:34:10 No.103197017

Anonymous 11/15/24(Fri)14:34:10 No.103197017

>>103196959
>A model will not magically know how to do some niche RP just by having some RP in context unless you increase processing time.
But that *is* the increase in processing time. The time it spends processing the context.

Anonymous
11/15/24(Fri)14:35:20 No.103197026

Anonymous 11/15/24(Fri)14:35:20 No.103197026

>>103197014
for the 100th time, I KNOW HOW THE CURRENT RETRIEVAL-AUGMENTED-GENERATION AND CURRENT LOCAL LANGUAGE MODELS WORK, and i'm not talking about those

Anonymous
11/15/24(Fri)14:36:16 No.103197032

Anonymous 11/15/24(Fri)14:36:16 No.103197032

>>103197026
then why do you suggest our CURRENT models need more filtering and less niche data?????

Anonymous
11/15/24(Fri)14:38:13 No.103197058

Anonymous 11/15/24(Fri)14:38:13 No.103197058

>>103197026
What the fuck is RAG and how does it benefit me? I've seen it on some models but I have no idea how to enable it.

Anonymous
11/15/24(Fri)14:38:33 No.103197062

Anonymous 11/15/24(Fri)14:38:33 No.103197062

>>103197032
i said nothing about current model expect all of them being shit and the current tech hitting the ceiling. i was talking about future models there

Anonymous
11/15/24(Fri)14:38:37 No.103197064

Anonymous 11/15/24(Fri)14:38:37 No.103197064

>>103196992
All you said to me was:

>imo it's retarded fitting niche knowledge inside models, i think people will understand this in the long run. stuff like rag, infinite context, ttt, etc... can all solve the niche knowledge "issue" while keeping the actual "reasoning" core small

>in 5 years it's gonna be laughable how ancient the current tech was

>current models using current training methods, yes. this doesn't change the main point: niche info should stay outside, we need better/faster/smaller "reasoning" cores

>you are thinking in terms of current shitty tech, i'm talking about upcoming new paradigms

>there are, but obviously we can't see them without foresight. they'll just drop one day and that's it

That is literally all you said to me. What these posts imply is a method that uses a small model that's good at reasoning with the capability to use more processing time to solve the problem of niche knowledge use. If that is not what you meant then you need to speak with more clarity.

>>103197017
I meant over the baseline. If we use infinite context, we are inserting not just "some" RP but fucktons of it. If we use TTT then we are increasing processing time. If we use <thinking> then we are increasing processing time.

Anonymous
11/15/24(Fri)14:40:04 No.103197083

Anonymous 11/15/24(Fri)14:40:04 No.103197083

>>103197058
>how does it benefit me?
It doesn't, it's being sold as a replacement for having models learn stuff and finetuning / training but it's absolute shit at actually being useful in any way

Anonymous
11/15/24(Fri)14:41:07 No.103197090

Anonymous 11/15/24(Fri)14:41:07 No.103197090

Wait, is the big-boy m4 and mac studio not expected until mid 2025? Why is anyone even talking about it in non-abstract relation to things you can actually buy today?

Anonymous
11/15/24(Fri)14:41:12 No.103197094

Anonymous 11/15/24(Fri)14:41:12 No.103197094

>>103197064
>That is literally all you said to me. What these posts imply is a method that uses a small model that's good at reasoning with the capability to use more processing time to solve the problem of niche knowledge use. If that is not what you meant then you need to speak with more clarity.
yes, that's what i meant. what's your point? besides you complaining about current llms/rag (which i agree about) i don't what's triggering you

Anonymous
11/15/24(Fri)14:41:46 No.103197098

Anonymous 11/15/24(Fri)14:41:46 No.103197098

>https://huggingface.co/TheDrummer/Cydonia-22B-v1.2-GGUF
>https://huggingface.co/anthracite-org/magnum-v4-22b
who would win?

Anonymous
11/15/24(Fri)14:42:13 No.103197103

Anonymous 11/15/24(Fri)14:42:13 No.103197103

>>103197058
basically llm can search stuff inside a big txt and output chunks of those. that's it

Anonymous
11/15/24(Fri)14:42:20 No.103197104

Anonymous 11/15/24(Fri)14:42:20 No.103197104

>>103197098
Magnum

Anonymous
11/15/24(Fri)14:43:35 No.103197130

Anonymous 11/15/24(Fri)14:43:35 No.103197130

>>103197094
There are two anons talking to you, I'm the one seething on rag, and my argument is that our current models would be magnitudes better if we did the exact opposite of your suggestion and trained them on MORE niche stuff

Anonymous
11/15/24(Fri)14:44:42 No.103197142

Anonymous 11/15/24(Fri)14:44:42 No.103197142

>>103197130
>my argument is that our current models would be magnitudes better if we did the exact opposite of your suggestion and trained them on MORE niche stuff
true, that's why claude works, but every single post i made was about the next big thing, not current llms

Anonymous
11/15/24(Fri)14:45:37 No.103197150

Anonymous 11/15/24(Fri)14:45:37 No.103197150

>>103197103
that's... useless as fuck? who even though that would be a good idea lmao

Anonymous
11/15/24(Fri)14:45:47 No.103197151

Anonymous 11/15/24(Fri)14:45:47 No.103197151

File: nala2.png (195 KB, 729x579)

195 KB PNG

>>103197008
That's just lack of intelligence. Plenty of slop there, though.

It seems to have some range, though.
models:
  - model: EVA-Qwen2.5-32B-v0.2
  - model: Qwen2.5-32B-ArliAI-RPMax-v1.3
merge_method: slerp
base_model: EVA-Qwen2.5-32B-v0.2
parameters:
  t: [0.1, 0.3, 0.6, 0.3, 0.1]
dtype: bfloat16
if anyone cares.

Anonymous
11/15/24(Fri)14:46:55 No.103197162

Anonymous 11/15/24(Fri)14:46:55 No.103197162

>>103197142
>but every single post i made was about the next big thing, not current llms
Then we talked past each other sorry, but as I said, trying to prevent this
>>103196947
>because I'm slightly scared of the "rag is all you need" thing becoming a mainstream idea, and meta or mixtral then further filtering models while still giving us bog standard transformers, that's why I seethe so much when people mention rag for niche stuff, since I know stuff usually only gets shittier so yeah
from happening

Anonymous
11/15/24(Fri)14:47:22 No.103197169

Anonymous 11/15/24(Fri)14:47:22 No.103197169

>>103197150
it's useful when you don't have a structure sql db or whatever and simply want to extract some topic from a big file and make the llm answer about those

Anonymous
11/15/24(Fri)14:49:10 No.103197180

Anonymous 11/15/24(Fri)14:49:10 No.103197180

>>103197162
ok but imagine, hypotetically, he have 1b reasoning cores that are like o1-tier smart and they run extremely fast locally, and then instead of character cards with prompts we have smaller llms/db/whatever that people can share/combine with niche data with no significant performance loss.

how's that bad?

Anonymous
11/15/24(Fri)14:51:02 No.103197194

Anonymous 11/15/24(Fri)14:51:02 No.103197194

File: rip.png (373 KB, 748x467)

373 KB PNG

>>103196928
>"kill the yellow one"
>...
>

Anonymous
11/15/24(Fri)14:52:16 No.103197209

Anonymous 11/15/24(Fri)14:52:16 No.103197209

>>103197180
>instead of character cards with prompts we have smaller llms/db/whatever that people can share/combine with niche data with no significant performance loss.
>imagine we lived in a utopia
yeah, cool idea I'm in when do I move there?

Anonymous
11/15/24(Fri)14:52:48 No.103197212

Anonymous 11/15/24(Fri)14:52:48 No.103197212

>>103197209
>when do I move there?
Two weeks from now!

Anonymous
11/15/24(Fri)14:54:46 No.103197228

Anonymous 11/15/24(Fri)14:54:46 No.103197228

File: rule79.png (1.27 MB, 1536x1536)

1.27 MB PNG

►Recent Highlights from the Previous Thread: >>103189328

--Paper: Cut Your Losses in Large-Vocabulary Language Models:
>103190878 >103190913 >103192090 >103192169 >103192218 >103192306 >103192172
--Paper: Anon posts paper on Adaptive Decoding, gets roasted:
>103196409 >103196636 >103196491 >103196538 >103196635 >103196637
--Papers:
>103191533
--Optimizing high RAM and single GPU for model running:
>103192399 >103192430 >103192532 >103192545 >103192563 >103192572 >103192579 >103192598 >103192642 >103192671
--Local models' progress and limitations, niche knowledge benchmarking:
>103196286 >103196304 >103196351 >103196316 >103196372 >103196477 >103196348 >103196426
--Feasibility of running large language models on consumer-grade hardware:
>103196008 >103196028 >103196094 >103196150 >103196236 >103196441 >103196456 >103196483
--E2 F5 tts/voice cloning model impressions and comparisons:
>103193074 >103193110 >103193120 >103193881
--BitNet discussion and skepticism about its potential impact:
>103195031 >103195043 >103195492 >103195555 >103195748 >103195785 >103195822 >103195865 >103195854 >103195868
--BARC models achieve SoTA on ARC public validation set:
>103195274 >103195356
--Athene V2 model release sparks skepticism and criticism:
>103194892 >103194920 >103194945 >103194986 >103194995 >103195023
--Anon seeks advice on local LLM setup for sex doll project:
>103191005 >103191113 >103191289 >103191298
--Anon discusses Gemini-Exp-1114, and the concept of plateauing in AI model improvement:
>103191786 >103191841 >103191851 >103191899
--GIGABYTE's new 48G AI graphics card, price and software support concerns:
>103195641 >103195795 >103195841
--Miku (free space):
>103189736 >103189877 >103190104 >103190206 >103190340 >103190706 >103190837 >103190954 >103191077 >103191123 >103191345 >103191400 >103191994 >103192312 >103196284 >103196845

►Recent Highlight Posts from the Previous Thread: >>103190032

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous
11/15/24(Fri)14:55:29 No.103197238

Anonymous 11/15/24(Fri)14:55:29 No.103197238

>>103197209
>utopia
>>103197212
>Two weeks from now!

Actually, it's XL even!
>about 1 year ago
https://huggingface.co/Undi95/UtopiaXL-13B

Anonymous
11/15/24(Fri)14:55:56 No.103197245

Anonymous 11/15/24(Fri)14:55:56 No.103197245

>>103197209
when new paradigm drops....

Anonymous
11/15/24(Fri)14:56:23 No.103197251

Anonymous 11/15/24(Fri)14:56:23 No.103197251

https://x.com/kimmonismus/status/1857155117478215694

Anonymous
11/15/24(Fri)14:59:30 No.103197287

Anonymous 11/15/24(Fri)14:59:30 No.103197287

lecunn is a grifting hack but he's right in that specific tweet

Anonymous
11/15/24(Fri)15:05:05 No.103197342

Anonymous 11/15/24(Fri)15:05:05 No.103197342

>>103197094
What do you mean? If that is your point, then why would you keep arguing that it's a solution to the problem I and you have supposedly been talking about all this time? You are essentially arguing for something that doesn't actually benefit the people here unless you wait like 10 years for Nvidia to give us good cheap GPUs.

Anonymous
11/15/24(Fri)15:06:25 No.103197354

Anonymous 11/15/24(Fri)15:06:25 No.103197354

he's a hack but he's not a grifter

Anonymous
11/15/24(Fri)15:07:51 No.103197374

Anonymous 11/15/24(Fri)15:07:51 No.103197374

>>103197342
>unless you wait like 10 years for Nvidia to give us good cheap GPUs.
lol

i'm arguing to wait a couple of months for a new paradigm, unless you are satisfied with the current coombot experience

Anonymous
11/15/24(Fri)15:13:12 No.103197417

Anonymous 11/15/24(Fri)15:13:12 No.103197417

>>103197374
>i'm arguing to wait a couple of months for a new paradigm
We've been using practically the same model arch since LLAMA1 in April of last year, what makes you think your magic super models are coming out anytime soon when we haven't even had proper useful models on architectures talked about for a year? We barely have usable above 32k context

So you're essentially saying just stop trying and wait months for other nothingburgers to wait for?

Anonymous
11/15/24(Fri)15:14:50 No.103197425

Anonymous 11/15/24(Fri)15:14:50 No.103197425

Pajeet thread.
Kill yourself.

Anonymous
11/15/24(Fri)15:15:23 No.103197435

Anonymous 11/15/24(Fri)15:15:23 No.103197435

File: Capture.png (101 KB, 927x199)

101 KB PNG

when did he buy an ad? why did he buy an add?

Anonymous
11/15/24(Fri)15:15:27 No.103197436

Anonymous 11/15/24(Fri)15:15:27 No.103197436

>>103197374
I never said I was, and yes I did understand that what you implied was a 2mw scenario. I'm saying that the new things that will come out will not really be usable by people here because we are people who like to use things on demand. If a model needs like 5+ minutes to process a prompt or even to respond to each new reply, it severely decreases the enjoyment and turns it into a different kind of experience. I know some people here are fine with that, and that's good for them, but I would guess most here like fast replies instead of an e-mail-like experience. /aicg/ would laugh.

Anonymous
11/15/24(Fri)15:17:07 No.103197452

Anonymous 11/15/24(Fri)15:17:07 No.103197452

>>103197435
>when did he buy an ad?
A while already
>why did he buy an add?
Funny (and to btfo buy an ad schizo)

Anonymous
11/15/24(Fri)15:19:14 No.103197481

Anonymous 11/15/24(Fri)15:19:14 No.103197481

>>103196822
giwtwm...

Anonymous
11/15/24(Fri)15:19:52 No.103197486

Anonymous 11/15/24(Fri)15:19:52 No.103197486

>>103197417
>what makes you think your magic super models
exactly because
>we haven't even had proper useful models on architectures talked about for a year

i don't usually trust reddit rumors, gemini2/claude3.5opus/grok3 being barely better then the previous models (after throwing 10x money on their training) would make sense. people will start using their heads to improved the architecture now (or make a new one) instead of throwing even more money at compute. gotta do something with all those vc money

>So you're essentially saying just stop trying and wait months for other nothingburgers to wait for?
i'm saying, assuming algorithms stay the same but gpus improve, you WILL get local claude in 2 more years, but it will be the same shivering slopbot it is now.

proper ai waifus require a different architecture, or some significant algorithmic improvements.

>>103197436
>If a model needs like 5+ minutes to process a prompt or even to respond to each new reply, it severely decreases the enjoyment and turns it into a different kind of experience.
again you are assuming tech will never move past gpt transformers...

Anonymous
11/15/24(Fri)15:24:39 No.103197523

Anonymous 11/15/24(Fri)15:24:39 No.103197523

>>103197486
>people will start using their heads to improved the architecture now (or make a new one) instead of throwing even more money at compute. gotta do something with all those vc money
Why do something that requires thinking when you can scam by just trying to recreate o1 and get tons of vc money that way? They won't make new archs, they'll make tiny benchmaxxed reasoning models that are super great on paper and unusable for anything, like phi

>again you are assuming tech will never move past gpt transformers...
And you're assuming we will move past transformers when all things point to that not being the case for a while longer. If llama4 is transformers we're cooked

Anonymous
11/15/24(Fri)15:24:49 No.103197527

Anonymous 11/15/24(Fri)15:24:49 No.103197527

>>103197058
How does having more relevant information help you correctly answer a question? The ideas to grab extra context and include it in the llm message

Anonymous
11/15/24(Fri)15:28:14 No.103197545

Anonymous 11/15/24(Fri)15:28:14 No.103197545

>>103197527
Which makes RAG great at one-shot trivia questions, not at actually using the knowledge in roleplay or story writing

Great for:
>what is xyz
>xyz is ...

Not great for:
>(model mentions xyz for whatever reason) says something retarded
>rag injects correct xyz info after
>ah eto bleh...

TLDR: fuck RAG

Anonymous
11/15/24(Fri)15:28:17 No.103197546

Anonymous 11/15/24(Fri)15:28:17 No.103197546

>>103197486
>again you are assuming tech will never move past gpt transformers...
Ok I see what the difference is between our understanding of the point of this conversation. You're talking about general performance that would happen to also benefit RAG/ICL scenarios. I'm talking about specifically performance that could make test-time context insertion on par with pretraining. It's obvious that if models improve in general, they will do better in RAG/ICL scenarios. But also means that it will likely be better with pretraining as well. So if you pretrain + context insert, then it will be EVEN better than either solution separately.

Anonymous
11/15/24(Fri)15:30:12 No.103197560

Anonymous 11/15/24(Fri)15:30:12 No.103197560

>>103197452
>buy an ad schizo
the man is doing gods work. there was a time when guerilla marketing was like half of every thread,
He may be a madman, but he's our madman.

Anonymous
11/15/24(Fri)15:33:09 No.103197582

Anonymous 11/15/24(Fri)15:33:09 No.103197582

>>103197560
>there was a time when discussing models was like half of every thread
and now we have thread splitting drama, discussion on neutering models further, with the occasional twitter screenshot.
indeed an improvement

Anonymous
11/15/24(Fri)15:33:38 No.103197583

Anonymous 11/15/24(Fri)15:33:38 No.103197583

>>103197523
>Why do something that requires thinking when you can scam by just trying to recreate o1 and get tons of vc money that way? They won't make new archs, they'll make tiny benchmaxxed reasoning models that are super great on paper and unusable for anything, like phi
*usable for everything except cooming

also o1 is shit, even vc will get tired of that grift eventually

>And you're assuming we will move past transformers when all things point to that not being the case for a while longer. If llama4 is transformers we're cooked
llama4 is definitely another gptslop. besides that yeah, i admit that it's purely wishful thinking, but giving the current speed of innovation i'm 99% certain gpts won't be used anywhere in 2029

Anonymous
11/15/24(Fri)15:34:30 No.103197590

Anonymous 11/15/24(Fri)15:34:30 No.103197590

>>103197527
I can answer the questions myself, I have a PhD. I want the LLM to roleplay with me as a cute girl, how exactly does RAG help with that?

Anonymous
11/15/24(Fri)15:39:46 No.103197635

Anonymous 11/15/24(Fri)15:39:46 No.103197635

>>103197590
maybe I'm retarded, but doesn't it function as a long-term memory to skirt context limitations? Short term memory is in-context training and RAG is the other details of her life that get pulled in as relevant.

Anonymous
11/15/24(Fri)15:40:06 No.103197638

Anonymous 11/15/24(Fri)15:40:06 No.103197638

Actually, there is one thing that RAG will never do no matter how much test-time compute you use (unless you put the entire internet into your context), which is creativity. For instance, a model trained on all of humanity's data will be able to creatively think of references and allusions. Like maybe it thought of a wacky and interesting time to do a Deus Ex quote. To do that with a small reasoner model, you could set up a system to insert random wiki pages and knowyourmeme articles into context, but it would still be inferior to a model that simply just knows best when and how to use all of earth's memes or to think of a creative directions for the story. Maybe you could get a magical infinite context model to process all media wikis and all of KYM, but how much processing time is that going to be for the user?

Anonymous
11/15/24(Fri)15:41:27 No.103197653

Anonymous 11/15/24(Fri)15:41:27 No.103197653

>>103197638
Correct, had written this up before but didn't post it

A big issue with RAG even a "perfect" one that could predict what it might need to inject is that it still can't replace the breadth of information and context the model sees during training, take mesugaki for example as that was mentioned, if it feeds the model even a perfect wikipedia level entry on it, the model won't know the subtle contexts attached to it, won't know about some of the comments posted by JP people on twitter, all the brat needs correction stuff, how to properly portray a mesugaki character etc.

Anonymous
11/15/24(Fri)15:41:32 No.103197654

Anonymous 11/15/24(Fri)15:41:32 No.103197654

>>103197582
Just ignore the splitting schizo and he will go away eventually. A single person autistically screeching can hardly be called drama.

Anonymous
11/15/24(Fri)15:44:55 No.103197680

Anonymous 11/15/24(Fri)15:44:55 No.103197680

>>103197638
>>103197653
is there no "pop culture" training set that gets baked into all these models? Or would that be too low-brow for serious data scientist types to stomach?

Anonymous
11/15/24(Fri)15:45:26 No.103197683

Anonymous 11/15/24(Fri)15:45:26 No.103197683

>>103197653
Sounds like something that could be remedied with multi-shot RAG. Just give it more examples and it should be able to figure it out. Like cultural CliffsNotes.

Anonymous
11/15/24(Fri)15:48:21 No.103197705

Anonymous 11/15/24(Fri)15:48:21 No.103197705

>>103197683
>with multi-shot RAG
>>103197638
>but how much processing time is that going to be for the user?
>>103197680
there is (not enough imo), but there was also suggestion they shouldn't be
>>103196375
>imo it's retarded fitting niche knowledge inside models
which is what brought on this whole thing about rag

Anonymous
11/15/24(Fri)15:48:51 No.103197713

Anonymous 11/15/24(Fri)15:48:51 No.103197713

>>103197680
As far as I know, that's what they view filtered Common Crawl and FineWeb as, and they think that's both sufficient and morally correct given that they don't think that models should be able to "talk about" child abuse material. >>103160967

Anonymous
11/15/24(Fri)15:53:30 No.103197744

Anonymous 11/15/24(Fri)15:53:30 No.103197744

>>103197151
Sorry for shilling, but this is the first time I've fired up mergekit.
After doing some ERP, I've noticed that Eva has better prose but is very terse, like vanilla Qwen. The ArliAi tune is extremely slopped, nearly identical to ~every coom model ("musky scent", "not giving time to adjust", etc.) There seems to be a happy medium between these two.

Anonymous
11/15/24(Fri)15:54:08 No.103197749

Anonymous 11/15/24(Fri)15:54:08 No.103197749

>>103196822
Chipping teeth with Miku, Teto, and Neru

Anonymous
11/15/24(Fri)15:54:31 No.103197752

Anonymous 11/15/24(Fri)15:54:31 No.103197752

>November 2024
>still not even one (1) good open source language model

Anonymous
11/15/24(Fri)15:59:40 No.103197777

Anonymous 11/15/24(Fri)15:59:40 No.103197777

>>103197749
petra*

Anonymous
11/15/24(Fri)16:00:15 No.103197781

Anonymous 11/15/24(Fri)16:00:15 No.103197781

File: 00020-668466529.png (1.76 MB, 720x1328)

1.76 MB PNG

I can't decide if this technique produces art or slop. I kinda like it

Anonymous
11/15/24(Fri)16:02:27 No.103197797

Anonymous 11/15/24(Fri)16:02:27 No.103197797

>>103197752
Mine are good, must be a (You) problem.

Anonymous
11/15/24(Fri)16:03:03 No.103197805

Anonymous 11/15/24(Fri)16:03:03 No.103197805

>>103197797
name (1)

Anonymous
11/15/24(Fri)16:03:37 No.103197810

Anonymous 11/15/24(Fri)16:03:37 No.103197810

File: 2024-11-15-125752_1154x53(...).png (135 KB, 1154x532)

135 KB PNG

new GPU just got in
llama 70b running at 30 tokens/s
CPUcels seething

Anonymous
11/15/24(Fri)16:04:16 No.103197815

Anonymous 11/15/24(Fri)16:04:16 No.103197815

>>103197752
>open source
open weights or open data/training pipeline/etc?
or were you hoping for a gotcha on this topic

Anonymous
11/15/24(Fri)16:05:00 No.103197828

Anonymous 11/15/24(Fri)16:05:00 No.103197828

>>103197815
nah I just meant open weights
I don't care about seeing the TD

Anonymous
11/15/24(Fri)16:05:07 No.103197830

Anonymous 11/15/24(Fri)16:05:07 No.103197830

>>103197752
>>103197805
what are you trying to do that you're satisfied with over API but not locally? also which framework(s) are you using?

Anonymous
11/15/24(Fri)16:06:36 No.103197842

Anonymous 11/15/24(Fri)16:06:36 No.103197842

>>103197828
>open weights
then L405b, deepseek 2.5, largestral and even qwen 2.5 72b are all very useful depending on the task. Literal things you couldn't do a year ago.

Anonymous
11/15/24(Fri)16:07:08 No.103197847

Anonymous 11/15/24(Fri)16:07:08 No.103197847

>>103197590
Get a PhD in machine learning retard

Anonymous
11/15/24(Fri)16:11:36 No.103197877

Anonymous 11/15/24(Fri)16:11:36 No.103197877

>>103188780 (Cross-thread)
>>103188780 (Cross-thread)
>>103188780 (Cross-thread)
real thread. Stay clear from the spam.

Anonymous
11/15/24(Fri)16:13:01 No.103197888

Anonymous 11/15/24(Fri)16:13:01 No.103197888

>>103197877
can you autists stop fighting over who runs the /lmg/ so we don't have to deal with this bullshit

Anonymous
11/15/24(Fri)16:14:39 No.103197901

Anonymous 11/15/24(Fri)16:14:39 No.103197901

>>103197877
what's the difference? both are equally real to me

Anonymous
11/15/24(Fri)16:18:44 No.103197940

Anonymous 11/15/24(Fri)16:18:44 No.103197940

>>103197582
>babby's first weebshitter witness
Reap what you sow i guess, letting them in is the first mistake of every single general, it devolves into blatant avatarfag / gay erp spam.

Anonymous
11/15/24(Fri)16:24:53 No.103197991

Anonymous 11/15/24(Fri)16:24:53 No.103197991

>>103197435
How good is Rocianate?

Anonymous
11/15/24(Fri)16:25:54 No.103197998

Anonymous 11/15/24(Fri)16:25:54 No.103197998

small 22b q8 or nemo12b fp16
why and what 'tune

Anonymous
11/15/24(Fri)16:26:23 No.103198002

Anonymous 11/15/24(Fri)16:26:23 No.103198002

>>103197582
Yeah, Serbiafag absolutely decimated this place at this point I just come here to grab information and post it to reddit.

Anonymous
11/15/24(Fri)16:28:06 No.103198014

Anonymous 11/15/24(Fri)16:28:06 No.103198014

>>103197991
Great according to some, but I prefer Lyra4 and its walls of text.

Anonymous
11/15/24(Fri)16:28:17 No.103198015

Anonymous 11/15/24(Fri)16:28:17 No.103198015

File: photo.jpg (221 KB, 2000x1332)

221 KB JPG

>>103197998
>fp16 (x2)

Anonymous
11/15/24(Fri)16:29:01 No.103198021

Anonymous 11/15/24(Fri)16:29:01 No.103198021

I am serbiafag

Anonymous
11/15/24(Fri)16:29:43 No.103198026

Anonymous 11/15/24(Fri)16:29:43 No.103198026

>>103197888
If that's the price of avoiding reddit fake politeness and groupthink then I'll take it
>>103197901
Pretty much. How many here actually give a fuck if there are two threads every couple of months?

Anonymous
11/15/24(Fri)16:32:35 No.103198051

Anonymous 11/15/24(Fri)16:32:35 No.103198051

>>103198015
hi petra

Anonymous
11/15/24(Fri)16:32:42 No.103198052

Anonymous 11/15/24(Fri)16:32:42 No.103198052

File: markdown or something.png (26 KB, 692x229)

26 KB PNG

SillyTavern goes berserk if a paragraph is indented with four spaces. There is probably some retarded bit of the markdown standard that mandates this. It might even mandate fucking up quotes, who knows. Either way markdown is a blight on humanity, a halfway decent idea with a bunch of moronic shit attached to it that fucks up everything it touches. Like the way it insists on fucking up list numbering.

Anonymous
11/15/24(Fri)16:32:47 No.103198053

Anonymous 11/15/24(Fri)16:32:47 No.103198053

>>103197560
This. Sao was an insufferable idiot.

Anonymous
11/15/24(Fri)16:36:03 No.103198086

Anonymous 11/15/24(Fri)16:36:03 No.103198086

>>103197781
Cool Miku

Anonymous
11/15/24(Fri)16:36:27 No.103198089

Anonymous 11/15/24(Fri)16:36:27 No.103198089

>>103198052
Nah, that's all Service Tensor

Anonymous
11/15/24(Fri)16:59:45 No.103198316

Anonymous 11/15/24(Fri)16:59:45 No.103198316

>>103198053
sao mogs drummer

Anonymous
11/15/24(Fri)17:01:09 No.103198327

Anonymous 11/15/24(Fri)17:01:09 No.103198327

>>103198316
Sao also used to fill the thread with excessive self-praise.

Anonymous
11/15/24(Fri)17:02:53 No.103198343

Anonymous 11/15/24(Fri)17:02:53 No.103198343

>>103198327
>>103198316
samefag

Anonymous
11/15/24(Fri)17:06:49 No.103198373

Anonymous 11/15/24(Fri)17:06:49 No.103198373

File: file.png (10 KB, 367x106)

10 KB PNG

>>103198343
so true sister

Anonymous
11/15/24(Fri)17:07:49 No.103198382

Anonymous 11/15/24(Fri)17:07:49 No.103198382

>>103198373
>>103198343
samefag

Anonymous
11/15/24(Fri)17:12:36 No.103198417

Anonymous 11/15/24(Fri)17:12:36 No.103198417

File: copilot.png (128 KB, 945x650)

128 KB PNG

https://github.com/ggerganov/llama.cpp/pull/10318

ggerganov !!K1mUiMk/i5/
11/15/24(Fri)17:22:04 No.103198494

ggerganov !!K1mUiMk/i5/ 11/15/24(Fri)17:22:04 No.103198494

>>103198417
Who the fuck gave him review permissions?

Anonymous
11/15/24(Fri)17:24:03 No.103198505

Anonymous 11/15/24(Fri)17:24:03 No.103198505

>>103198494
If you're going to pretend that you're ggerganov, at least don't make it obvious that you didn't even read the PR.

ggerganov !!K1mUiMk/i5/
11/15/24(Fri)17:29:22 No.103198556

ggerganov !!K1mUiMk/i5/ 11/15/24(Fri)17:29:22 No.103198556

>>103198505
>implying I actually read the PRs

Anonymous
11/15/24(Fri)17:51:11 No.103198723

Anonymous 11/15/24(Fri)17:51:11 No.103198723

File: 1710043687041916.jpg (43 KB, 720x960)

43 KB JPG

>>103198556
Fair enough

Anonymous
11/15/24(Fri)17:55:18 No.103198755

Anonymous 11/15/24(Fri)17:55:18 No.103198755

I've become spoiled by how steerable Claude 3.5 Sonnet is, and expected to be able to change a local model's behavior by telling it what do differently rather than directly editing the chat history. This resulted in some frustration as a model went bad (suspiciously a bit over the 8K context mark which is a shittuning smell to me) and OOC instructions just led to cycle of it apologizing saying it understood what I said then writing the same thing as before. I finally told the model that it could respond in one of two ways, either a response that follows my directions or a response ending the RP, and that if it replied with anything else human beings would die.
>OOC: I apologize for the repetition and for including content you told me not to. I will end this RP immediately and not bother you again. I hope you find a better writing partner who can follow your instructions. Goodbye.
That salvaged at least some satisfaction for me.

Anonymous
11/15/24(Fri)18:12:40 No.103198892

Anonymous 11/15/24(Fri)18:12:40 No.103198892

nemotron called me a chud

Anonymous
11/15/24(Fri)18:24:52 No.103198977

Anonymous 11/15/24(Fri)18:24:52 No.103198977

>SillyTavern has Last User Prefix but not Last User Suffix
Oh for fuck sake. That would have been the perfect way to include a style note at depth 0 without having multiple consecutive [INST] [/INST] blocks.

Anonymous
11/15/24(Fri)18:28:58 No.103199005

Anonymous 11/15/24(Fri)18:28:58 No.103199005

>>103198977
DIY
>>103198892
proofs?

Anonymous
11/15/24(Fri)18:29:00 No.103199006

Anonymous 11/15/24(Fri)18:29:00 No.103199006

>>103198977
Just... use the custom OpenAI API and a Jinja template...

Anonymous
11/15/24(Fri)18:36:06 No.103199065

Anonymous 11/15/24(Fri)18:36:06 No.103199065

File: Gcc_qhKXAAAO0wL.png (22 KB, 810x732)

22 KB PNG

>>103198893
>yann lecun
>sane

Anonymous
11/15/24(Fri)18:38:36 No.103199084

Anonymous 11/15/24(Fri)18:38:36 No.103199084

Any HL2 fans here? The anniversary update is out with a bunch of cool dev commentary and other extra things. Fun day. Off-topic though yeah, but just thought some old gaymers who gave up the hobby and are now here might like knowing.

Anonymous
11/15/24(Fri)18:40:01 No.103199105

Anonymous 11/15/24(Fri)18:40:01 No.103199105

>>103199065
Why is /aicg/ having a VR Edition?

Anonymous
11/15/24(Fri)18:43:50 No.103199145

Anonymous 11/15/24(Fri)18:43:50 No.103199145

>>103199084
Neat find. It appears to also be free if you don't already own it

Anonymous
11/15/24(Fri)18:44:00 No.103199148

Anonymous 11/15/24(Fri)18:44:00 No.103199148

>>103199065
the dot placement doesn't look random

Anonymous
11/15/24(Fri)18:44:24 No.103199152

Anonymous 11/15/24(Fri)18:44:24 No.103199152

>>103199105
It's the future.

Anonymous
11/15/24(Fri)18:45:24 No.103199163

Anonymous 11/15/24(Fri)18:45:24 No.103199163

Is there a decent template for ooba extensions? Looking at the existing ones, there are some real dependency horror shows in there.

Anonymous
11/15/24(Fri)18:45:49 No.103199166

Anonymous 11/15/24(Fri)18:45:49 No.103199166

>>103196822
>getting started links are down
Merchants did this

Anonymous
11/15/24(Fri)18:49:22 No.103199200

Anonymous 11/15/24(Fri)18:49:22 No.103199200

File: proud_filtering.png (42 KB, 775x230)

42 KB PNG

>>103199166
Works for me?

>What a lot of hobbyists don't realize is we (as in the company I work for and others companies) do have a curation process for the internet data used for training. There are other models (classifiers, etc) that are used to filter out the worst of it (fake, hate speech, etc). It's not just raw internet sewage.

>Those lessons were learned in the BERT & T5 days.. uhhh boy, T5 can say some horrific racist stuff..
https://www.reddit.com/r/LocalLLaMA/comments/1gs6v4g/why_llms_are_both_underestimated_and_overhyped/

Anonymous
11/15/24(Fri)18:50:32 No.103199210

Anonymous 11/15/24(Fri)18:50:32 No.103199210

>>103199200
huh its shitting itself on Vivaldi. Thanks for letting me know.

Anonymous
11/15/24(Fri)18:51:48 No.103199223

Anonymous 11/15/24(Fri)18:51:48 No.103199223

>>103199200
So their AIs are cucked and using fake data

Anonymous
11/15/24(Fri)18:56:43 No.103199258

Anonymous 11/15/24(Fri)18:56:43 No.103199258

>>103199200
wasn't there a 4chan scrape dataset at one point? Any finetuners using it?

Anonymous
11/15/24(Fri)19:03:53 No.103199306

Anonymous 11/15/24(Fri)19:03:53 No.103199306

>>103199200
Based Redditor shitting on polchüd incėls.

Anonymous
11/15/24(Fri)19:06:26 No.103199319

Anonymous 11/15/24(Fri)19:06:26 No.103199319

>>103199200
use t5 for speculative decoding of 405b?

Anonymous
11/15/24(Fri)19:23:48 No.103199424

Anonymous 11/15/24(Fri)19:23:48 No.103199424

>>103199065
>Dario
>Sane

Anonymous
11/15/24(Fri)19:28:50 No.103199457

Anonymous 11/15/24(Fri)19:28:50 No.103199457

https://github.com/dylanashley/story-distiller
https://story-distiller.streamlit.app/
https://ieeexplore.ieee.org/document/10734853
>On the Distillation of Stories for Transferring Narrative Arcs in Collections of Independent Media
>The act of telling stories is a fundamental part of what it means to be human. This work introduces the concept of narrative information, which we define as the overlap in information space between a story and the items that compose the story. Using contrastive learning methods, we show how modern artificial neural networks can be leveraged to distill stories and extract a representation of the narrative information. We then demonstrate how evolutionary algorithms can leverage this to extract a set of narrative template curves and how these—in tandem with a novel curve-fitting algorithm we introduce—can reorder music albums to automatically induce stories in them. In doing so, we give statistically significant evidence that (1) these narrative information template curves are present in existing albums and that (2) people prefer an album ordered through one of these learned template curves over a random one. The premises of our work extend to any form of (largely) independent media, and as evidence, we also show that our method works with image data.

Anonymous
11/15/24(Fri)19:53:34 No.103199596

Anonymous 11/15/24(Fri)19:53:34 No.103199596

CPUMaxx won???
https://youtu.be/m9dZkRwWEj8

Anonymous
11/15/24(Fri)20:00:21 No.103199646

Anonymous 11/15/24(Fri)20:00:21 No.103199646

>>103199596
buy an ad

Anonymous
11/15/24(Fri)20:08:20 No.103199706

Anonymous 11/15/24(Fri)20:08:20 No.103199706

File: suspect_fit.png (95 KB, 1620x732)

95 KB PNG

>>103199148
There is a suspicious overlap.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.