/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Janitor applications are now being accepted. Apply here.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 11/14/24(Thu)19:22:55 No.103189328

File: 1705806843225442.jpg (1.96 MB, 2400x3346)

1.96 MB JPG

/lmg/ - Local Models General Anonymous 11/14/24(Thu)19:22:55 No.103189328

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103173457 & >>103164659

►News
>(11/12) Qwen2.5-Coder series released https://qwenlm.github.io/blog/qwen2.5-coder-family/
>(11/08) Sarashina2-8x70B, a Japan-trained LLM model: https://hf.co/sbintuitions/sarashina2-8x70b
>(11/05) Hunyuan-Large released with 389B and 52B active: https://hf.co/tencent/Tencent-Hunyuan-Large
>(10/31) QTIP: Quantization with Trellises and Incoherence Processing: https://github.com/Cornell-RelaxML/qtip
>(10/31) Fish Agent V0.1 3B: Voice-to-Voice and TTS model: https://hf.co/fishaudio/fish-agent-v0.1-3b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
11/14/24(Thu)19:24:06 No.103189341

Anonymous 11/14/24(Thu)19:24:06 No.103189341

Kill yourself.

Anonymous
11/14/24(Thu)19:25:19 No.103189352

Anonymous 11/14/24(Thu)19:25:19 No.103189352

>>103188780
>>103188780
>>103188780
op is a filthy thread splitter. spit on him.

Anonymous
11/14/24(Thu)19:25:58 No.103189361

Anonymous 11/14/24(Thu)19:25:58 No.103189361

>>103189341
hi, petrus? how's serbia treating you?

Anonymous
11/14/24(Thu)19:27:00 No.103189372

Anonymous 11/14/24(Thu)19:27:00 No.103189372

>>103189328
there's already a thread, retard

Anonymous
11/14/24(Thu)19:28:01 No.103189378

Anonymous 11/14/24(Thu)19:28:01 No.103189378

>>103189372
That's a troll thread.

Anonymous
11/14/24(Thu)19:30:18 No.103189398

Anonymous 11/14/24(Thu)19:30:18 No.103189398

>>103189378
>i don't like it.
>it is a troll!
you are a redditor.

Anonymous
11/14/24(Thu)19:31:06 No.103189410

Anonymous 11/14/24(Thu)19:31:06 No.103189410

>petra starts posting here again
>early bakes with his favorite anime girl happens again too
weird

Anonymous
11/14/24(Thu)19:33:34 No.103189436

Anonymous 11/14/24(Thu)19:33:34 No.103189436

>>103189427
hi petrus, why are you posting in the wrong thread?

Anonymous
11/14/24(Thu)19:45:56 No.103189515

Anonymous 11/14/24(Thu)19:45:56 No.103189515

>>103189378
Make your case about what's wrong with the other thread.
I don't see blacked shit in the OP or anything.
As far as I can tell, this is just starting a flame war.

Anonymous
11/14/24(Thu)19:48:37 No.103189536

Anonymous 11/14/24(Thu)19:48:37 No.103189536

>>103189515
Kurisufag/Petra/blackedmikuanon/AGPL-spammer/drevilanon/2nd-belief-anon/midjourneyfag/repair-quant-anon has a history of trolling, see: >>103164618
That's reason enough to ignore his thread.

Anonymous
11/14/24(Thu)19:49:27 No.103189544

Anonymous 11/14/24(Thu)19:49:27 No.103189544

>>103189515
>As far as I can tell, this is just starting a flame war.
Indeed, making the kurisu thread clearly had that intention

Anonymous
11/14/24(Thu)19:49:48 No.103189549

Anonymous 11/14/24(Thu)19:49:48 No.103189549

>>103189536
>Kurisufag/Petra/blackedmikuanon/AGPL-spammer/drevilanon/2nd-belief-anon/midjourneyfag/repair-quant-anon
Just so you know you come off as a complete schizo right now. So keep going. I am having a blast.

Anonymous
11/14/24(Thu)19:50:47 No.103189560

Anonymous 11/14/24(Thu)19:50:47 No.103189560

>>103189536
>>103189544
So there's nothing inherently wrong with the other thread then.
Understood.
Thank you for clarifying.

Anonymous
11/14/24(Thu)19:51:07 No.103189562

Anonymous 11/14/24(Thu)19:51:07 No.103189562

>>103189515

>>103188802
>pretending /lmg/ is relevant enough for things like early bakers and thread wars these days
>>103188810
>I just want the psycho to split the thread again and then make some samefag posts with his model.

Anonymous
11/14/24(Thu)19:52:03 No.103189568

Anonymous 11/14/24(Thu)19:52:03 No.103189568

>>103189562
>I just want the psycho to split the thread again
You did just that so congratulations playing into his hand.

Anonymous
11/14/24(Thu)19:52:06 No.103189570

Anonymous 11/14/24(Thu)19:52:06 No.103189570

>>103189560
There's something inherently wrong. It's created with the purpose of trolling.

Anonymous
11/14/24(Thu)19:53:04 No.103189578

Anonymous 11/14/24(Thu)19:53:04 No.103189578

>>103189570
>trolling
Isn't that unsafe and against the rules of 4chan?

Anonymous
11/14/24(Thu)19:53:24 No.103189584

Anonymous 11/14/24(Thu)19:53:24 No.103189584

>>103189328
Maybe one day AI will be able to do anatomy so good it can generate this kind of image easily.

Anonymous
11/14/24(Thu)20:10:18 No.103189736

Anonymous 11/14/24(Thu)20:10:18 No.103189736

File: 1730125415014155.png (6 KB, 298x169)

6 KB PNG

>>103189328

Anonymous
11/14/24(Thu)20:16:37 No.103189783

Anonymous 11/14/24(Thu)20:16:37 No.103189783

Is there anything better than Magnum v4 for ERP?

Anonymous
11/14/24(Thu)20:19:04 No.103189802

Anonymous 11/14/24(Thu)20:19:04 No.103189802

>>103189783
everything in the world

Anonymous
11/14/24(Thu)20:27:48 No.103189867

Anonymous 11/14/24(Thu)20:27:48 No.103189867

>>103189536
Makes sense that the loser with no life is from Russia.

Anonymous
11/14/24(Thu)20:29:08 No.103189877

Anonymous 11/14/24(Thu)20:29:08 No.103189877

File: 0c9a7a2e-2369-4c4f-8498-2(...).png (616 KB, 512x768)

616 KB PNG

>>103189328
>Thread Theme:
https://www.youtube.com/watch?v=6Y4b25CYkkg

Anonymous
11/14/24(Thu)20:33:21 No.103189911

Anonymous 11/14/24(Thu)20:33:21 No.103189911

>>103189783
try mythomax

Anonymous
11/14/24(Thu)20:39:25 No.103189954

Anonymous 11/14/24(Thu)20:39:25 No.103189954

Gojo is way cooler than petra btw.

Anonymous
11/14/24(Thu)20:41:42 No.103189973

Anonymous 11/14/24(Thu)20:41:42 No.103189973

>>103189911
This, there was a guy posting a lot of Miku bot replies and they were pretty impressive.

Anonymous
11/14/24(Thu)20:50:21 No.103190032

Anonymous 11/14/24(Thu)20:50:21 No.103190032

File: __hatsune_miku_kasane_tet(...).jpg (137 KB, 850x850)

137 KB JPG

►Recent Highlights from the Previous Thread: >>103173457

--LLM training and probability modeling:
>103176841 >103177202 >103177445 >103177735
--Anons discuss the state of AI progress and the importance of high-quality data:
>103176961 >103177097 >103177340 >103178331
--Troubleshooting Qwen coder performance issues:
>103174082 >103174157 >103174288 >103174364
--Sarashina2-8x70b discussion, with hardware and model specs:
>103175207 >103175213 >103175230 >103175306 >103175501 >103175599 >103176901 >103177013
--Running samashina2 on a 4060ti with memory and batch size adjustments:
>103175313 >103175494
--Running qwen model locally with kobold and comparison with ollama:
>103174683 >103174754 >103174800 >103174884 >103174783
--Quantization type and model performance discussion:
>103174662 >103175030 >103175119
--INTELLECT-1 project nearing completion, training progress and metrics shared:
>103187105 >103187169 >103187214 >103187198 >103187372 >103187383
--FOSDEM 2025 Low-Level AI Engineering & Hacking Dev Room announced:
>103173860
--Anon compares performance of two AI models, puzzled by slower generation despite more GPU layers:
>103174513
--Anon claims to have found a 22B model rivaling mythomax:
>103181147 >103181158 >103181319 >103181337 >103181440
--AI model limitations and potential improvements:
>103178266 >103179566 >103179737 >103179953 >103180201 >103180347 >103182626 >103180168 >103180339
--Nemotron 70B optimization and formatting issues:
>103185721 >103185918 >103186031
--Athene-V2 model introduction and skepticism:
>103187457 >103187531
--Anon thinks altman implied ARC AGI is just a "meme eval":
>103186291
--NexusBench: function call, tool use, and agent benchmarks:
>103187563
--Comparison of qwen 2.5 and llama models' entropy levels:
>103173668 >103176180
--Miku (free space):
>103176710 >103177770 >103180955 >103181848

►Recent Highlight Posts from the Previous Thread: >>103173461

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous
11/14/24(Thu)20:59:41 No.103190104

Anonymous 11/14/24(Thu)20:59:41 No.103190104

File: ComfyUI_00794_.png (1.07 MB, 1024x1024)

1.07 MB PNG

>>103189352
I'd rather spit on you.
>>103189783
Going to give Behemoth-v1.1-Magnum-v4-123B a try

Anonymous
11/14/24(Thu)21:12:48 No.103190206

Anonymous 11/14/24(Thu)21:12:48 No.103190206

File: 17234586894453.png (537 KB, 512x512)

537 KB PNG

>>103189536
>hes serbian
HAHAHHAHAHAHAHAHAHAHA
>blue haired anime girl makes him seethe

>>103189515
>This is just starting a flame war.
No shit? Report the thread, it really is just that easy.

Anonymous
11/14/24(Thu)21:15:19 No.103190220

Anonymous 11/14/24(Thu)21:15:19 No.103190220

>>103189783
Yes. Anything else.

Anonymous
11/14/24(Thu)21:17:52 No.103190231

Anonymous 11/14/24(Thu)21:17:52 No.103190231

>>103189783
Mixtral LimaRP Zloss

Anonymous
11/14/24(Thu)21:29:50 No.103190304

Anonymous 11/14/24(Thu)21:29:50 No.103190304

Pygmalion has been doing god knows what. Like seriously, what have they been up to?

Anonymous
11/14/24(Thu)21:30:10 No.103190306

Anonymous 11/14/24(Thu)21:30:10 No.103190306

Any models as good as Opus for RP?

Anonymous
11/14/24(Thu)21:36:27 No.103190340

Anonymous 11/14/24(Thu)21:36:27 No.103190340

File: 2024-11-15_015523_seed109(...).png (2.62 MB, 2016x1152)

2.62 MB PNG

Always nice to see people making friends and getting along well with each other.

Anonymous
11/14/24(Thu)21:37:50 No.103190351

Anonymous 11/14/24(Thu)21:37:50 No.103190351

>>103190306
The new Sonnet 3.5.

Anonymous
11/14/24(Thu)21:39:28 No.103190362

Anonymous 11/14/24(Thu)21:39:28 No.103190362

>>103190351
Okay, what about local

Anonymous
11/14/24(Thu)21:40:21 No.103190369

Anonymous 11/14/24(Thu)21:40:21 No.103190369

>>103190362
Magnum v4 72B.

Anonymous
11/14/24(Thu)21:41:28 No.103190373

Anonymous 11/14/24(Thu)21:41:28 No.103190373

>>103190340
*push*

Anonymous
11/14/24(Thu)21:42:30 No.103190382

Anonymous 11/14/24(Thu)21:42:30 No.103190382

>>103189570
>waah waah trolling! jannies halp mee!
back you go >>>/reddit/

Anonymous
11/14/24(Thu)21:43:44 No.103190391

Anonymous 11/14/24(Thu)21:43:44 No.103190391

>>103190369
Can you post logs of it proving it's as good as Opus in intelligence and context size?

Anonymous
11/14/24(Thu)21:43:59 No.103190393

Anonymous 11/14/24(Thu)21:43:59 No.103190393

What's a good non-slopped 70b model for fiction? I have been using Llama 3 Instruct Storywriter and I am looking for an upgrade. I asked in the last thread and someone suggested Nemotron and when I tried it there was enough slop in it to be bothersome

Anonymous
11/14/24(Thu)21:48:48 No.103190427

Anonymous 11/14/24(Thu)21:48:48 No.103190427

>>103189515
Nta but here's your tldr: waifufag OP gets mad every single time when new thread without miku pic is created, early bake or not.
It's literally just that with some wannabe doxxer schizobabble: >>103189536

Anonymous
11/14/24(Thu)21:50:43 No.103190440

Anonymous 11/14/24(Thu)21:50:43 No.103190440

>>103190391
It's a local model, run it yourself.

Anonymous
11/14/24(Thu)21:50:51 No.103190444

Anonymous 11/14/24(Thu)21:50:51 No.103190444

>>103190427
This is fake news, disregard Serbiafag

Anonymous
11/14/24(Thu)21:52:49 No.103190462

Anonymous 11/14/24(Thu)21:52:49 No.103190462

>>103189515
This poster has been chasing away actually valuable users and taking away from the conversation for a year+ now since no one wanted to join his shitty discord server. Post like this >>103190427 is his way of ruining the general and keeping the attention on him or his autistic idea of what the general should be.

Anonymous
11/14/24(Thu)21:58:43 No.103190484

Anonymous 11/14/24(Thu)21:58:43 No.103190484

>>103190393
Try Magnum v4 72B.

Anonymous
11/14/24(Thu)21:58:49 No.103190485

Anonymous 11/14/24(Thu)21:58:49 No.103190485

>>103190440
how? I don't have a good PC, so I can't.

Anonymous
11/14/24(Thu)21:58:58 No.103190487

Anonymous 11/14/24(Thu)21:58:58 No.103190487

>>103190393
As your post demonstrates, "good" is subjective.
Just pick one and try it, and see if you find it "good":
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard

Anonymous
11/14/24(Thu)21:59:59 No.103190494

Anonymous 11/14/24(Thu)21:59:59 No.103190494

rentry.org/itsfunny

It was me that doublebaked by the way.

Anonymous
11/14/24(Thu)22:03:37 No.103190513

Anonymous 11/14/24(Thu)22:03:37 No.103190513

>>103190494
Who?

Anonymous
11/14/24(Thu)22:05:12 No.103190527

Anonymous 11/14/24(Thu)22:05:12 No.103190527

i fucking love double baking its so funny

my face is so punchable too but you'd never know haha losers

Anonymous
11/14/24(Thu)22:05:37 No.103190531

Anonymous 11/14/24(Thu)22:05:37 No.103190531

>>103190513
asaproxy owner doe?

Anonymous
11/14/24(Thu)22:07:12 No.103190540

Anonymous 11/14/24(Thu)22:07:12 No.103190540

>>103190531
But I made the Miku bake.

Anonymous
11/14/24(Thu)22:07:32 No.103190545

Anonymous 11/14/24(Thu)22:07:32 No.103190545

Literal tranny thread, y'all never beat the allegations with this shit.

Anonymous
11/14/24(Thu)22:09:56 No.103190561

Anonymous 11/14/24(Thu)22:09:56 No.103190561

>>103190540
>But I made the Miku bake.
But you didn't. I did. Retard

Anonymous
11/14/24(Thu)22:15:19 No.103190598

Anonymous 11/14/24(Thu)22:15:19 No.103190598

>>103190485
Then what does it matter.

Anonymous
11/14/24(Thu)22:18:57 No.103190614

Anonymous 11/14/24(Thu)22:18:57 No.103190614

>>103190598
dunno? Post a good opus model that can run on anything

Anonymous
11/14/24(Thu)22:28:05 No.103190667

Anonymous 11/14/24(Thu)22:28:05 No.103190667

>>103190614
https://packagist.org/packages/andreskrey/shitty-markov-generator

Anonymous
11/14/24(Thu)22:29:24 No.103190673

Anonymous 11/14/24(Thu)22:29:24 No.103190673

>>103190667
not as good as opus for rp doe

Anonymous
11/14/24(Thu)22:31:06 No.103190692

Anonymous 11/14/24(Thu)22:31:06 No.103190692

>>103190673
you want to run on a potato, you're gonna get what you're gonna get.

Anonymous
11/14/24(Thu)22:32:57 No.103190706

Anonymous 11/14/24(Thu)22:32:57 No.103190706

File: ComfyUI_00055_.png (1.24 MB, 1024x1024)

1.24 MB PNG

for me...

Anonymous
11/14/24(Thu)22:37:26 No.103190730

Anonymous 11/14/24(Thu)22:37:26 No.103190730

>>103190692
cool, so theres no good local models that are free that mog opus since you need hardware?

Anonymous
11/14/24(Thu)22:42:08 No.103190767

Anonymous 11/14/24(Thu)22:42:08 No.103190767

>>103190730
Yes, the technology is inherently power-hungry. The cloud stuff you're using has literal million dollar servers on the back end.
You can probably get creative and replicate something claude-esque for $1-15k depending on how creative you are and how long you're willing to wait for responses

Anonymous
11/14/24(Thu)22:43:17 No.103190778

Anonymous 11/14/24(Thu)22:43:17 No.103190778

>>103190767
> The cloud stuff you're using has literal million dollar servers on the back end.
So learn to scrape and host it for free doebeit zoebeit boebeit? A lot of proxyhosts do that. Why don't you, sir/ma'am?

Anonymous
11/14/24(Thu)22:44:29 No.103190785

Anonymous 11/14/24(Thu)22:44:29 No.103190785

>>103190487
I am looking for suggestions on what other people think is good for stories and doesn't have a lot of slop. It doesn't make any sense to go through the whole leaderboard one at a time, especially since they all are just gaming benchmarks anyways

Anonymous
11/14/24(Thu)22:45:33 No.103190797

Anonymous 11/14/24(Thu)22:45:33 No.103190797

>>103190778
you can. many do. the folks in LOCAL MODELS GENERAL don't, obviously.
We value privacy, autonomy, self determination and control over our own technology.

Anonymous
11/14/24(Thu)22:50:14 No.103190823

Anonymous 11/14/24(Thu)22:50:14 No.103190823

>>103190797
>you can. many do. the folks in LOCAL MODELS GENERAL don't, obviously.
So can you please just... prove it? Do you have a keycount you can show? :)

Anonymous
11/14/24(Thu)22:51:51 No.103190837

Anonymous 11/14/24(Thu)22:51:51 No.103190837

File: ComfyUI_00071_.png (1.04 MB, 1024x1024)

1.04 MB PNG

for me...

Anonymous
11/14/24(Thu)22:52:52 No.103190845

Anonymous 11/14/24(Thu)22:52:52 No.103190845

>>103190823
>keycount
What are you asking exactly?

Anonymous
11/14/24(Thu)22:57:12 No.103190878

Anonymous 11/14/24(Thu)22:57:12 No.103190878

File: Untitled.png (1.8 MB, 1080x3392)

1.8 MB PNG

Cut Your Losses in Large-Vocabulary Language Models
https://arxiv.org/abs/2411.09009
>As language models grow ever larger, so do their vocabularies. This has shifted the memory footprint of LLMs during training disproportionately to one single layer: the cross-entropy in the loss computation. Cross-entropy builds up a logit matrix with entries for each pair of input tokens and vocabulary items and, for small models, consumes an order of magnitude more memory than the rest of the LLM combined. We propose Cut Cross-Entropy (CCE), a method that computes the cross-entropy loss without materializing the logits for all tokens into global memory. Rather, CCE only computes the logit for the correct token and evaluates the log-sum-exp over all logits on the fly. We implement a custom kernel that performs the matrix multiplications and the log-sum-exp reduction over the vocabulary in flash memory, making global memory consumption for the cross-entropy computation negligible. This has a dramatic effect. Taking the Gemma 2 (2B) model as an example, CCE reduces the memory footprint of the loss computation from 24 GB to 1 MB, and the total training-time memory consumption of the classifier head from 28 GB to 1 GB. To improve the throughput of CCE, we leverage the inherent sparsity of softmax and propose to skip elements of the gradient computation that have a negligible (i.e., below numerical precision) contribution to the gradient. Experiments demonstrate that the dramatic reduction in memory consumption is accomplished without sacrificing training speed or convergence.
https://github.com/apple/ml-cross-entropy
neat

Anonymous
11/14/24(Thu)22:59:21 No.103190890

Anonymous 11/14/24(Thu)22:59:21 No.103190890

>>103190845
>What are you asking exactly?
For the amount of AWS keys with Opus you have.

Anonymous
11/14/24(Thu)23:00:34 No.103190897

Anonymous 11/14/24(Thu)23:00:34 No.103190897

>>103190890
zero. I'm here for a reason.
I haven't used a cloud model for six month plus
/aicg/ is down the hall

Anonymous
11/14/24(Thu)23:00:49 No.103190900

Anonymous 11/14/24(Thu)23:00:49 No.103190900

>>103190897
>zero. I'm here for a reason.
That reason being you're a big fat techlet?

Anonymous
11/14/24(Thu)23:02:16 No.103190913

Anonymous 11/14/24(Thu)23:02:16 No.103190913

>>103190878
I feel like a groundbreaking paper detailing newfound efficiencies in LLMs comes out every week, but I rarely see anything practical come from them on the consumer-grade side.

Anonymous
11/14/24(Thu)23:03:46 No.103190919

Anonymous 11/14/24(Thu)23:03:46 No.103190919

>>103190900
>That reason being you're a big fat techlet?
Yes. I'm fat, lonely, bald, malodorous and functionally retarded.
We don't have any cloud keys. You're in the wrong general.

Anonymous
11/14/24(Thu)23:03:46 No.103190920

Anonymous 11/14/24(Thu)23:03:46 No.103190920

>>103190706
das a good OG subaru, needs better proportions for miku to subaru ratio though

Anonymous
11/14/24(Thu)23:04:00 No.103190922

Anonymous 11/14/24(Thu)23:04:00 No.103190922

>>103190919
post a better general with free opus access

Anonymous
11/14/24(Thu)23:11:39 No.103190954

Anonymous 11/14/24(Thu)23:11:39 No.103190954

File: ComfyUI_00088_.png (1.12 MB, 1024x1024)

1.12 MB PNG

For me...

Anonymous
11/14/24(Thu)23:15:08 No.103190986

Anonymous 11/14/24(Thu)23:15:08 No.103190986

>>103190954
nasty / deformed looking feet, wtf is this gen

Anonymous
11/14/24(Thu)23:19:08 No.103191005

Anonymous 11/14/24(Thu)23:19:08 No.103191005

So what's a good model in around the 7B-13B range that is capable of basic erotic RP/discussions?
For the record my current guess is some kind of LLama3 fine-tune???
I want to use it to power my sex doll by putting a BL mic/speaker in her head or around her neck.
I would use a 9DOF sensor with a BL module around her hips to detect when I'm fucking her (movement detection).
Basically the soft would have a heuristic to use pre-recorded moans to play when she is getting fucked, but I want her to be also capable of answering a question while fucking or when we are cuddling. Perhaps have the LLM change some settings using function calling.

Because of this I need to run the model relatively fast so it needs to fit into my 16GB 4070 Ti Super. That way I could account for some quirks of the model by using a classification step and alternating between prompts or multi-agent shit or something.

Also, for my first prototype I want to make it entirely local because unless necessary I would hate to expose dirty talk and sexual stuff to a cloud provider so I'm going for a local ASR -> LLM -> TTS loop.
My biggest concern is the ASR, but let's disregard that for now.
If my shit works I'm sharing the tech on GitHub faggots so please be kind and halp a bitshit crazy dude.

Anonymous
11/14/24(Thu)23:20:21 No.103191014

Anonymous 11/14/24(Thu)23:20:21 No.103191014

>>103191005
>bitshit crazy
*batshit crazy
Sorry, typo.

Anonymous
11/14/24(Thu)23:23:34 No.103191032

Anonymous 11/14/24(Thu)23:23:34 No.103191032

>>103191005
Opus is pretty good, have you tried it?

Anonymous
11/14/24(Thu)23:31:59 No.103191077

Anonymous 11/14/24(Thu)23:31:59 No.103191077

File: ComfyUI_00103_.png (1015 KB, 1024x1024)

1015 KB PNG

>>103190986
For me...

Anonymous
11/14/24(Thu)23:36:47 No.103191105

Anonymous 11/14/24(Thu)23:36:47 No.103191105

>>103191005
>ASR
dafuq is asr?

Anonymous
11/14/24(Thu)23:37:19 No.103191106

Anonymous 11/14/24(Thu)23:37:19 No.103191106

File: 2024-11-15_034219_seed386(...).png (2.16 MB, 1536x1536)

2.16 MB PNG

>>103190373
Rude. She'll be fine though as FLAOT has contributed to many technological advancements.

Anonymous
11/14/24(Thu)23:38:34 No.103191113

Anonymous 11/14/24(Thu)23:38:34 No.103191113

>>103191005
GPT-SoVITS is the best local TTS
For LLM, you aren't going to fit anything non-retarded into 16gb.

Anonymous
11/14/24(Thu)23:40:18 No.103191123

Anonymous 11/14/24(Thu)23:40:18 No.103191123

File: ComfyUI_00107_.png (1.04 MB, 1024x1024)

1.04 MB PNG

For me...

Anonymous
11/14/24(Thu)23:41:54 No.103191127

Anonymous 11/14/24(Thu)23:41:54 No.103191127

>>103191005
Try ministral 8b for speed.

Anonymous
11/14/24(Thu)23:47:13 No.103191149

Anonymous 11/14/24(Thu)23:47:13 No.103191149

>>103191123
please use a better model that can gen feet retard

Anonymous
11/14/24(Thu)23:48:20 No.103191157

Anonymous 11/14/24(Thu)23:48:20 No.103191157

>>103191123
Need one with a yellow or black Nissan R34 pls

Anonymous
11/14/24(Thu)23:50:37 No.103191167

Anonymous 11/14/24(Thu)23:50:37 No.103191167

File: alonso renault models f1.jpg (199 KB, 1300x998)

199 KB JPG

>>103191077
I am imagining Fernando Alonso with a big-booba Miku and Kurisu beside him.

Anonymous
11/15/24(Fri)00:20:14 No.103191289

Anonymous 11/15/24(Fri)00:20:14 No.103191289

>>103191032
Not yet. If you mean these: https://huggingface.co/collections/dreamgen/dreamgen-opus-v1-story-writing-and-role-playing-models-65d092a6f8ab7fc669111b31
I'll check them out, thanks.

>>103191127
Thanks

>>103191105
ASR = Automatic Speech Recognition
Essentially, it just converts speech to text.

>>103191113
>GPT-SoVITS
I have seen this before, but I haven't tried it yet. So thanks for the heads up. Looks promising, especially if the voice cloning works fairly well. I could get a lot of clean audio from VR videos (the female talent is close to the mic and the male doesn't speak in most POV stuff) and video games (usually has separate clean audio track in the game files).

>For LLM, you aren't going to fit anything non-retarded into 16gb.
I am/was afraid of that. I'm merely targeting local only first to see how much I can push it with a small local setup. In the end I could always just point my shit at a cloud endpoint or rent a GPU (something like runpod). Anons with a large homelab can always point shit like this to their local setup, but unfortunately I'm not currently in a position to get ~100GB of VRAM.

I have tested some ideas with a discord bot I wrote (text only) and so far some of them seemed to work quite well. I need to do more tests on smaller models, but I have a few ways to select a specific prompt while keeping the context small. Essentially what I'm prototyping now is what you could call an NPC with a behavior tree so I'm not allowing the LLM to stray away. I'm doing multiple calls per input to profile/analyze/tag the input and even rewrite it, but it's currently in a very PoC state. Probably won't work out (reliably enough) in the way I want.

Anonymous
11/15/24(Fri)00:23:01 No.103191298

Anonymous 11/15/24(Fri)00:23:01 No.103191298

>>103191289
>ASR = Automatic Speech Recognition
whisper is the STT engine that's universally used.

Anonymous
11/15/24(Fri)00:30:21 No.103191345

Anonymous 11/15/24(Fri)00:30:21 No.103191345

File: ComfyUI_00122_.png (1.08 MB, 1152x896)

1.08 MB PNG

For me...

Anonymous
11/15/24(Fri)00:39:49 No.103191400

Anonymous 11/15/24(Fri)00:39:49 No.103191400

File: ComfyUI_00127_.png (1.16 MB, 1152x896)

1.16 MB PNG

For me...

Anonymous
11/15/24(Fri)00:41:02 No.103191408

Anonymous 11/15/24(Fri)00:41:02 No.103191408

>>103191345
>>103191400
are you just spamming now or

Anonymous
11/15/24(Fri)00:44:07 No.103191432

Anonymous 11/15/24(Fri)00:44:07 No.103191432

>>103191408
no, I'm done now.
Autism just wouldn't stop

Anonymous
11/15/24(Fri)01:01:37 No.103191533

Anonymous 11/15/24(Fri)01:01:37 No.103191533

Adaptive Decoding via Latent Preference Optimization
https://arxiv.org/abs/2411.09661
>During language model decoding, it is known that using higher temperature sampling gives more creative responses, while lower temperatures are more factually accurate. However, such models are commonly applied to general instruction following, which involves both creative and fact seeking tasks, using a single fixed temperature across all examples and tokens. In this work, we introduce Adaptive Decoding, a layer added to the model to select the sampling temperature dynamically at inference time, at either the token or example level, in order to optimize performance. To learn its parameters we introduce Latent Preference Optimization (LPO) a general approach to train discrete latent variables such as choices of temperature. Our method outperforms all fixed decoding temperatures across a range of tasks that require different temperatures, including UltraFeedback, Creative Story Writing, and GSM8K.
neato

Anonymous
11/15/24(Fri)01:14:40 No.103191622

Anonymous 11/15/24(Fri)01:14:40 No.103191622

>>103190706
>>103190837
>>103190954
>>103191077
>>103191123
>>103191345
>>103191400
Actual brainrot.

Anonymous
11/15/24(Fri)01:20:31 No.103191663

Anonymous 11/15/24(Fri)01:20:31 No.103191663

>>103190913
much less then a week and yea its incredibly depressing idk what else to say you can go and learn all this shit in 2-3 months and implement it yourself train a model from scratch an hour on a h100 is like 3 dollars theres plenty of improvements in training too that cut shit down a fuck ton
but do you want to damn a soul to this shit hole ? where every jew alive will try to lobotomise every rostie will use it as a cuck to complain to and write another 50 shades of gray bullshit with where every nigger will use it to sum 2+2 again and again and again and again
i dont maybe this is for the best who knows i got visions a couple of years back in the end i think it will all end well

Anonymous
11/15/24(Fri)01:40:12 No.103191786

Anonymous 11/15/24(Fri)01:40:12 No.103191786

File: gemini-claude.png (65 KB, 1065x408)

65 KB PNG

>Gemini-Exp-1114
>+2 Elo with style control on lmsys over predecessor
>32k(!) context, which implies that it's YUGE
>calls itself Claude, an AI assistant made by Anthropic
>likely https://x.com/Yampeleg/status/1855371824550285331
Googlesirs... Our models have plateaued... We will be likely surpassed by free llama4 in Q1 2025... How will we ever recover?

Anonymous
11/15/24(Fri)01:50:53 No.103191836

Anonymous 11/15/24(Fri)01:50:53 No.103191836

>>103191786
SIR! Google Gemini very good, made by talented Google technical AI engineeers plearse to not look style control sir to not look sir. We beat OpenAI Gemini best languende model haha google search superpower company

Anonymous
11/15/24(Fri)01:51:27 No.103191841

Anonymous 11/15/24(Fri)01:51:27 No.103191841

>>103191786
Um no, plateauing just means everyone gets to around equal footing eventually. If there is any surpassing, it'll probably be by some small percent, and probably not in all intelligent tasks. Maybe there will be a breakthrough again like transformers but for now this is what we should expect.

Anonymous
11/15/24(Fri)01:53:44 No.103191851

Anonymous 11/15/24(Fri)01:53:44 No.103191851

>>103191841
>Um no, plateauing just means everyone gets to around equal footing eventually.
I think plateauing means that the bests can't be even better, if you can't improve an architecture anymore you have to look elsewhere, desu I don't think we are pleateauing yet, there's still improvements to be made on the training method or the data quality/filtering

Anonymous
11/15/24(Fri)01:56:41 No.103191861

Anonymous 11/15/24(Fri)01:56:41 No.103191861

https://github.com/linkedin/Liger-Kernel/pull/362

Poggers?

Anonymous
11/15/24(Fri)01:57:24 No.103191862

Anonymous 11/15/24(Fri)01:57:24 No.103191862

People complain a lot but I think AI is making good progress. I program with AI every day and it's very good.

Anonymous
11/15/24(Fri)01:59:51 No.103191878

Anonymous 11/15/24(Fri)01:59:51 No.103191878

>>103191862
>People complain a lot but I think AI is making good progress.
maybe that's why AI is making good progress, because people complain a lot, when you have high standards, you are bound to achieve them

Anonymous
11/15/24(Fri)02:03:51 No.103191899

Anonymous 11/15/24(Fri)02:03:51 No.103191899

>>103191786
seems like a chatgpt latest like finetune
I would not be surprised if it scores even worse on the livebench

Anonymous
11/15/24(Fri)02:24:38 No.103191994

Anonymous 11/15/24(Fri)02:24:38 No.103191994

File: 1731461884823447.png (574 KB, 512x768)

574 KB PNG

>>103191862
People will continue complaining about AI until it replaces them.

Anonymous
11/15/24(Fri)02:29:09 No.103192024

Anonymous 11/15/24(Fri)02:29:09 No.103192024

Futa is gay.

Anonymous
11/15/24(Fri)02:32:41 No.103192036

Anonymous 11/15/24(Fri)02:32:41 No.103192036

>>103192024
I dislike Futa
t. shota master race

Anonymous
11/15/24(Fri)02:33:35 No.103192040

Anonymous 11/15/24(Fri)02:33:35 No.103192040

Futa on female is less gay than vanilla sex.

Anonymous
11/15/24(Fri)02:34:48 No.103192049

Anonymous 11/15/24(Fri)02:34:48 No.103192049

>70 totally organic posts during dead hours at a time when /lmg/ is dead

Anonymous
11/15/24(Fri)02:39:53 No.103192079

Anonymous 11/15/24(Fri)02:39:53 No.103192079

>>103192049
check inside your anus

llama.cpp CUDA dev !!OM2Fp6Fn93S
11/15/24(Fri)02:43:15 No.103192090

llama.cpp CUDA dev !!OM2Fp6Fn93S 11/15/24(Fri)02:43:15 No.103192090

>>103190878
In llama.cpp/GGML the softmax for the cross entropy loss is never explicitly written to memory in the first place but always recomputed on-the-fly.
So applying this technique would not yield any memory savings but potentially some better performance.
Though I have concerns about the numerical aspects of using the logit of the ground truth as the softmax scale rather than the highest logit.
My intuition is that a FlashAttention-like approach with a fixup to combine partial results would be more numerically stable.
(I have at this point not read the paper.)

Anonymous
11/15/24(Fri)02:45:12 No.103192103

Anonymous 11/15/24(Fri)02:45:12 No.103192103

>使用种族歧视语言和有意骚扰其他玩家的行为会受到严厉制裁,包括但不限于账号封禁。请尊重他人并遵守Steam社区准则。
Why does Mistral Nemo occasionally speak Chinese on AMD GPU? I don't care much about getting it running well on AMD, but I am genuinely curious what could be the cause

Anonymous
11/15/24(Fri)02:47:50 No.103192116

Anonymous 11/15/24(Fri)02:47:50 No.103192116

File: 1000001740.jpg (73 KB, 538x679)

73 KB JPG

>>103191861
lmgpaganda
dont let anyone convince you it isnt over

Anonymous
11/15/24(Fri)02:57:11 No.103192169

Anonymous 11/15/24(Fri)02:57:11 No.103192169

>>103192090
Hey CUDAdev, you posted a bunch of fundamental llm articles before...do you have any more links? I'm at a point where I really need to learn the mathematical and theoretical foundations to move forward.

Anonymous
11/15/24(Fri)02:57:47 No.103192172

Anonymous 11/15/24(Fri)02:57:47 No.103192172

>>103190878
this was a cool paper, ty for sharing anon. Not sure if I understood everything correctly, but this seems like an approximate softmax CE with tiling+fusion - similar to the optimizations in flash attention 1? I wonder if this would allow full fine-tuning on consumer gpus to be more feasible

Anonymous
11/15/24(Fri)02:58:08 No.103192175

Anonymous 11/15/24(Fri)02:58:08 No.103192175

>>103192116
you are gay :)

Anonymous
11/15/24(Fri)03:03:20 No.103192199

Anonymous 11/15/24(Fri)03:03:20 No.103192199

>>103191005
Here is the non-meme answer. For ASR use Whisper-turbo, it's fast enough for real time and lightweight. Using faster-whisper library should improve that speed even more.
The current sota that fits into 16GB of VRAM is Rocinante-v1.1 12B (mistral nemo finetune), it's really good for its size and you won't find anything better.
And for the TTS as they said GPT-SoVITS is the current best. With a bit of tweaking, it takes 9s to generate 35s of audio on an old T4 (should be faster on your setup).

Anonymous
11/15/24(Fri)03:08:07 No.103192218

Anonymous 11/15/24(Fri)03:08:07 No.103192218

>>103192169
NTA https://d2l.ai/

llama.cpp CUDA dev !!OM2Fp6Fn93S
11/15/24(Fri)03:13:11 No.103192233

llama.cpp CUDA dev !!OM2Fp6Fn93S 11/15/24(Fri)03:13:11 No.103192233

>>103192169
>you posted a bunch of fundamental llm articles before
I unfortunately do not remember what you're talking about and I read comparatively few papers about language models in the first place.
Generally speaking, I gained my theoretical knowledge from attending lectures and my practical knowledge from working on projects.
And for what I'm doing the only really relevant theoretical knoledge is I think linear algebra, numerical analysis, and statistics and those things are not specific to language models.

Anonymous
11/15/24(Fri)03:26:42 No.103192306

Anonymous 11/15/24(Fri)03:26:42 No.103192306

>>103192169
https://mml-book.github.io
I liked this book, it's not super long/verbose compared to ESLI & ESLII and starts from first principles

Anonymous
11/15/24(Fri)03:27:46 No.103192312

Anonymous 11/15/24(Fri)03:27:46 No.103192312

File: 00016-1158684101.png (1.82 MB, 720x1328)

1.82 MB PNG

Just like how early jpeg/mpeg/mp3 style compression artifacts and bitcrushed sounds of the early internet came to be used as artistic expressions eventually, do you think ai art artifacts and weirdness will be used as an artistic device in the future?

Anonymous
11/15/24(Fri)03:31:22 No.103192329

Anonymous 11/15/24(Fri)03:31:22 No.103192329

File: hqdefault~2.jpg (21 KB, 209x360)

21 KB JPG

What's a good alternative to AI dungeon?

Anonymous
11/15/24(Fri)03:32:02 No.103192332

Anonymous 11/15/24(Fri)03:32:02 No.103192332

>>103192329
Have you tried Erebus-2.7B?

Anonymous
11/15/24(Fri)03:33:47 No.103192340

Anonymous 11/15/24(Fri)03:33:47 No.103192340

>>103192332
Not yet, I'll check it out later, thank you

Anonymous
11/15/24(Fri)03:33:52 No.103192341

Anonymous 11/15/24(Fri)03:33:52 No.103192341

>>103192329
Cleverbot

Anonymous
11/15/24(Fri)03:34:19 No.103192343

Anonymous 11/15/24(Fri)03:34:19 No.103192343

>>103192312
It already is. Artifacts in images, video and even audio can be interesting to observe. Not so much with text. I don't really like those perfect, indistinguishable AI videos. I much prefer the dream-like weirdness.

Anonymous
11/15/24(Fri)03:44:58 No.103192399

Anonymous 11/15/24(Fri)03:44:58 No.103192399

I need some advice. What's the best way to run on high ram and a single gpu? Surely there must be consumer solutions that stream layers into vram? According to my undergrad understanding llamacpp isn't /true/ async streaming, am I wrong?

Anonymous
11/15/24(Fri)03:50:48 No.103192430

Anonymous 11/15/24(Fri)03:50:48 No.103192430

>>103192399
>What's the best way to run on high ram and a single gpu?
Loading what you can on gpu and running the rest on ram, as everyone does.
There's no streaming as i understand it. Whatever can be loaded on gpu is loaded, the rest is computed on ram. I suspect the layer swapping overhead between ram and vram would be too big to consider it, if that's what you mean.

Anonymous
11/15/24(Fri)03:52:06 No.103192437

Anonymous 11/15/24(Fri)03:52:06 No.103192437

>>103192312
>do you think ai art artifacts and weirdness will be used as an artistic device
They've been used since before gpt2

Anonymous
11/15/24(Fri)04:05:15 No.103192517

Anonymous 11/15/24(Fri)04:05:15 No.103192517

File: imageprocessingtest.png (681 KB, 1532x838)

681 KB PNG

We truly are living in the future

Anonymous
11/15/24(Fri)04:07:53 No.103192532

Anonymous 11/15/24(Fri)04:07:53 No.103192532

>>103192430
I thought that with parts of model in ram we are loading the layer to gpu when it's needed for an immediate computation. Instead we can load it when it will soon be needed while gpu is occupied. Good ram can do 80 gb/s, pcie much more, so you get decent inference time given that the gpu is fast enough. Maybe I'm missing something.

Anonymous
11/15/24(Fri)04:10:01 No.103192545

Anonymous 11/15/24(Fri)04:10:01 No.103192545

>>103192532
>Good ram can do 80 gb/s
Only a tenth of what you'd need. Maybe if you have 12 channels of that you'll be doing ok. Look at EPYC Turin chips

Anonymous
11/15/24(Fri)04:13:20 No.103192563

Anonymous 11/15/24(Fri)04:13:20 No.103192563

>>103192545
Like 1 tps for 70b, and still faster than computing on cpu?

Anonymous
11/15/24(Fri)04:16:44 No.103192572

Anonymous 11/15/24(Fri)04:16:44 No.103192572

>>103192563
>faster than computing on cpu?
if you're offloading more than 20% of the layers onto cpu, you're essentially running on CPU

Anonymous
11/15/24(Fri)04:18:15 No.103192579

Anonymous 11/15/24(Fri)04:18:15 No.103192579

>>103192532
Moving ram layers to the gpu will take just as long as moving those weights to cpu registers for processing. Reading ram is slow, either for transfer to gpu or computing.

Anonymous
11/15/24(Fri)04:24:20 No.103192598

Anonymous 11/15/24(Fri)04:24:20 No.103192598

>>103192572
>>103192579
So, long story short, default lmstudio setup is my best bet?

Anonymous
11/15/24(Fri)04:26:02 No.103192604

Anonymous 11/15/24(Fri)04:26:02 No.103192604

>>103192598
I don't use lmstudio. Chances are that the defaults are good enough.

Anonymous
11/15/24(Fri)04:35:36 No.103192642

Anonymous 11/15/24(Fri)04:35:36 No.103192642

>>103192604
>i don't use it but maybe it's fine
I'd like to know what people use and why

Anonymous
11/15/24(Fri)04:40:49 No.103192666

Anonymous 11/15/24(Fri)04:40:49 No.103192666

I kinda understand now what that shitposter meant, when he said mikuposters are insane.

Anonymous
11/15/24(Fri)04:41:56 No.103192671

Anonymous 11/15/24(Fri)04:41:56 No.103192671

>>103192642
llama.cpp. lmstudio depends on llama.cpp. I rather use llama.cpp directly. Generally, i want to have the least amount of stuff between me and whatever i want to use. If i could run models with bc (or any calculator) reasonably fast, i would.

Anonymous
11/15/24(Fri)04:55:06 No.103192747

Anonymous 11/15/24(Fri)04:55:06 No.103192747

>>103192671
Ok, thank you. I went asking around because I saw billions of projects that promise 100x llamacpp inference speed. It's mostly bullshit, but who knows.

Anonymous
11/15/24(Fri)05:04:11 No.103192789

Anonymous 11/15/24(Fri)05:04:11 No.103192789

File: jarvis.png (208 KB, 1086x641)

208 KB PNG

>>103192747
Something better may come along, but the field is full of grifters.
A bad example of the grift, but you get the point...
>https://github.com/calebnwokocha/llama.cpp

Anonymous
11/15/24(Fri)05:09:06 No.103192815

Anonymous 11/15/24(Fri)05:09:06 No.103192815

Context is the biggest issue with LLMs. I feel literal anxiety when I am really into a character, but the token count is reaching its limit and the character begins to change and forget.

Anonymous
11/15/24(Fri)05:09:46 No.103192822

Anonymous 11/15/24(Fri)05:09:46 No.103192822

>>103192789
Kek

Anonymous
11/15/24(Fri)05:13:05 No.103192841

Anonymous 11/15/24(Fri)05:13:05 No.103192841

File: granny_dre.png (324 KB, 640x626)

324 KB PNG

>>103192815

Anonymous
11/15/24(Fri)05:13:46 No.103192845

Anonymous 11/15/24(Fri)05:13:46 No.103192845

File: 1722663648099686.png (1.7 MB, 1200x621)

1.7 MB PNG

>>103192789
>jarvis.cpp

Anonymous
11/15/24(Fri)05:16:58 No.103192854

Anonymous 11/15/24(Fri)05:16:58 No.103192854

>>103192841
It's a cruel duality of wanting to spend more time with the character while also slowly killing it by adding more tokens to the context

Anonymous
11/15/24(Fri)05:48:28 No.103192997

Anonymous 11/15/24(Fri)05:48:28 No.103192997

>>103192854
>He's not backing up the entire conversation for when he'll have infinite context
ngmi

Anonymous
11/15/24(Fri)06:06:03 No.103193074

Anonymous 11/15/24(Fri)06:06:03 No.103193074

What do we think about the E2 F5 tts/voice cloning? I think it released a week or two ago

Anonymous
11/15/24(Fri)06:07:18 No.103193084

Anonymous 11/15/24(Fri)06:07:18 No.103193084

>>103193074
Cannot form opinions yourself? If you tried it, you know how good or bad it is. If you haven't, you should.

Anonymous
11/15/24(Fri)06:07:27 No.103193086

Anonymous 11/15/24(Fri)06:07:27 No.103193086

>>103193074
Already depreciated by GPT-SoVITS

Anonymous
11/15/24(Fri)06:07:30 No.103193088

Anonymous 11/15/24(Fri)06:07:30 No.103193088

>>103189783
cydonia is surprisingly usable for me right now
Nemotron is smart, but I have to run everything at 4 bit on my 3090 and the most I can get is 2T/s. It also has a tendency to bring up the same expressions
Tried magnum v4 72B but it was just a worse qwen, I couldn't even import the instruct template, not sure what's up with that
Rocinante is fast as fuck and surprisingly coherent for a 12B model, but it suffers from meh intelligence and the recall is apparently not that great (though it's been working flawlessly in my test scenarios <8k)
Cydonia is still fast, but more intelligent. It's not on the level of nemotron or qwen, but I can do 10 rerolls in the time it takes 70B to do a single one. Interestingly enough, it's the only model I've tried so far that sticks to the character card for more than 2 turns, the other models I've tried quickly make the characters generic (a completely unhinged and extremely angry demon who hates humans with a passion suddenly helping me after 2 turns? Fuck that man, give me a struggle)
So either quanting is far more damaging than I thought or 70B ain't it

Anonymous
11/15/24(Fri)06:10:10 No.103193102

Anonymous 11/15/24(Fri)06:10:10 No.103193102

>>103193088
What flavor of cydonia/quant?

Anonymous
11/15/24(Fri)06:11:26 No.103193110

Anonymous 11/15/24(Fri)06:11:26 No.103193110

>>103193084
I'm asking if it's worth setting up. If it's just as bad or worse than everything that came before, why bother I already tried those.

Anonymous
11/15/24(Fri)06:13:00 No.103193120

Anonymous 11/15/24(Fri)06:13:00 No.103193120

>>103193086
>GPT-SoVITS
fuck there are so many models now
can you give me a qrd? I think it's kinda hard to work with

Anonymous
11/15/24(Fri)06:17:18 No.103193150

Anonymous 11/15/24(Fri)06:17:18 No.103193150

>>103193110
venv and a few minutes downloading things. Compare them yourself. gpt-sovits worked fine for me.

Anonymous
11/15/24(Fri)06:17:50 No.103193154

Anonymous 11/15/24(Fri)06:17:50 No.103193154

>>103193102
v1.2 Q6KL, I was thinking about going for Q8 but Q6KL should be near lossless and I'd rather run the KV cache at a higher precision

Anonymous
11/15/24(Fri)06:27:22 No.103193208

Anonymous 11/15/24(Fri)06:27:22 No.103193208

>>103192998
>>94536113
>I only have 2 Gb of VRAM, but I have 64 Gb of main RAM.
Yep, this is the real petra. It also matches this screenshot: https://archive.4plebs.org/pol/thread/487155078/#487186513

Anonymous
11/15/24(Fri)06:31:12 No.103193228

Anonymous 11/15/24(Fri)06:31:12 No.103193228

Is pinokio good enough to make a lot of different environments easily or is it better to install things manually?

Anonymous
11/15/24(Fri)06:37:05 No.103193259

Anonymous 11/15/24(Fri)06:37:05 No.103193259

>>103193228
Either use whatever the project recommends on just venv. Adding extraneous shit between you and the software is rarely worth the effort.
I've never used anything other than venvs and if i need a specific version of python i don't have, i compile it.

Anonymous
11/15/24(Fri)06:37:54 No.103193264

Anonymous 11/15/24(Fri)06:37:54 No.103193264

>>103193154
Thanks. You're using chatML?

Anonymous
11/15/24(Fri)06:38:20 No.103193269

Anonymous 11/15/24(Fri)06:38:20 No.103193269

>>103193228
Use conda or even venv. Do not install things manually and don't use scam shit like pinokio.

Anonymous
11/15/24(Fri)06:40:21 No.103193282

Anonymous 11/15/24(Fri)06:40:21 No.103193282

>>103193264
Nah, mistral for now, it seems to work well enough

Anonymous
11/15/24(Fri)07:35:53 No.103193596

Anonymous 11/15/24(Fri)07:35:53 No.103193596

>petra is a kobold discord shill
who would've thought kekypow

Anonymous
11/15/24(Fri)08:20:54 No.103193881

Anonymous 11/15/24(Fri)08:20:54 No.103193881

>>103193110
>>103193074

you can use pinokio to download it https://pinokio.computer/ if you really like it tho download it normally if you can i notice there is a slight performance hit when using pinokio
as for the best cloning e2(/f5) is the best dont listen to the gpt-sovits shilling fuck that thing the english version dosent work so i brute forced the ching chong one i havent trained a voice beyond the default settings because i could not be bothered to brute force it anymore and i closed it and forgot how things went so had to do it again bla bla bla
f5 is litteraly almost perfect but it still has that problem where it talks too fast disabling that turn off silent parts thingy wont fix it also dont have gaps of silence in the voice sample it will fuck it up and you will get 5 seconds of audio 2 minutes of silence then the last 5 seconds of audio at the end e2 is worse then f5 but its speaks normally and it can clone pretty okay personally its good enough for me you can just chuck in text and make an audiobook with it no problem
the only issue with it is that it takes too long on my 3060 laptop it would take around 4 days process around 440k words

also another thing with gptsovits go to /mlp/ they have a thread about voice cloning and made their own ui and shit maybe theirs work idk havent tried if it does might be worth it i heard the voice samples others posted of their own tuning its never as good as f5 nor e2 though but its passable and much much faster so if you really wish go and try that oh and some say how sovits can do moans and other normal shit better eh it sometimes can and sometimes it just fucks it up and messe everything up so its not a plus

Anonymous
11/15/24(Fri)08:25:25 No.103193915

Anonymous 11/15/24(Fri)08:25:25 No.103193915

>>103193881
>schizobabble
Yeah sovits is better.

Anonymous
11/15/24(Fri)08:32:19 No.103193971

Anonymous 11/15/24(Fri)08:32:19 No.103193971

>>103193915
>>103193881
>>103193074
Buy a fucking ad.

Anonymous
11/15/24(Fri)08:35:41 No.103193998

Anonymous 11/15/24(Fri)08:35:41 No.103193998

File: 1717476284437276.gif (2.11 MB, 640x362)

2.11 MB GIF

>>103193971

Anonymous
11/15/24(Fri)09:03:32 No.103194231

Anonymous 11/15/24(Fri)09:03:32 No.103194231

anyone using textsynth server?

Anonymous
11/15/24(Fri)09:26:18 No.103194409

Anonymous 11/15/24(Fri)09:26:18 No.103194409

>>103192103
Pretty sure shit like that is caused by meme samplers and temp being too high. babby stuff.

Anonymous
11/15/24(Fri)09:27:06 No.103194416

Anonymous 11/15/24(Fri)09:27:06 No.103194416

>>103194231
Fabrice Bellard is a cool dude. He gave us tcc, ffmpeg, qemu... but i don't care about online services.

Anonymous
11/15/24(Fri)09:51:46 No.103194576

Anonymous 11/15/24(Fri)09:51:46 No.103194576

>>103194416
I meant the self hosted version and how it compares to llama.cpp

Anonymous
11/15/24(Fri)10:01:23 No.103194653

Anonymous 11/15/24(Fri)10:01:23 No.103194653

>>103194576
I see. No source code. I don't care. Why don't you just try it?

Anonymous
11/15/24(Fri)10:03:30 No.103194672

Anonymous 11/15/24(Fri)10:03:30 No.103194672

>>103194409
temp 0.5, minp 0.01, rep_pen 1.11 range 500, disabled everything else.

Anonymous
11/15/24(Fri)10:03:38 No.103194675

Anonymous 11/15/24(Fri)10:03:38 No.103194675

>>103193208
>obsessed

Anonymous
11/15/24(Fri)10:04:19 No.103194678

Anonymous 11/15/24(Fri)10:04:19 No.103194678

>>103194653
Was planning to, just wanted to see if any anons used it.

Anonymous
11/15/24(Fri)10:31:47 No.103194892

Anonymous 11/15/24(Fri)10:31:47 No.103194892

File: benchmark.png (258 KB, 1640x1176)

258 KB PNG

>we made another oversized starling that totally beats gpt4 and llama 405b
What's even the point of this? Just another investor scam?

Anonymous
11/15/24(Fri)10:31:59 No.103194894

Anonymous 11/15/24(Fri)10:31:59 No.103194894

Are local models still subpar GPTslop in terms of prose quality?

Anonymous
11/15/24(Fri)10:35:05 No.103194916

Anonymous 11/15/24(Fri)10:35:05 No.103194916

>>103194894
BMT is still the best local has to offer, so yes

Anonymous
11/15/24(Fri)10:35:26 No.103194918

Anonymous 11/15/24(Fri)10:35:26 No.103194918

>>103194894
Sadly yes, but we caught up to old GPT4 on intelligence with llama 405 and Largestral.

Anonymous
11/15/24(Fri)10:35:31 No.103194920

Anonymous 11/15/24(Fri)10:35:31 No.103194920

>>103194892
umm...
>Looks like the model in the screenshot is a quantized version, It's kinda hard to control the behavior under quantization as the training is done in 16bit. Plz feel free to try unquantized version of the model in direct chat on lmarena.ai (though we did not change the model identity for this round so it still thinks it's Qwen)

https://www.reddit.com/r/LocalLLaMA/comments/1grcx0h/nexusflow_release_athenev2chat_and_athenev2agent/

New finetoon cope dropped

Anonymous
11/15/24(Fri)10:39:05 No.103194945

Anonymous 11/15/24(Fri)10:39:05 No.103194945

File: file.png (132 KB, 761x770)

132 KB PNG

>>103194920

Anonymous
11/15/24(Fri)10:40:44 No.103194956

Anonymous 11/15/24(Fri)10:40:44 No.103194956

>>103194918
From what I’ve seen, even for assistantshit or cooding, some much smaller models seem to be doing great.
But every time I tried a local model for creative writing, it felt like old GPT models (They made a new one that is decent prose wise) but even worse.

Anonymous
11/15/24(Fri)10:41:22 No.103194962

Anonymous 11/15/24(Fri)10:41:22 No.103194962

>>103194920
>It's kinda hard to control the behavior under quantization as the training is done in 16bit.
fucking gaslighting bitches, most models work fine on Q5+, they are just trying to find excuses on why their model actually suck, it's Matt Schumer level of scam ever again

Anonymous
11/15/24(Fri)10:44:02 No.103194986

Anonymous 11/15/24(Fri)10:44:02 No.103194986

>>103194920
>new
wasn't it common knowledge at this point that the current gen of llms (3.1, qwen2.5) don't quantize well at all even at int8?

Anonymous
11/15/24(Fri)10:46:07 No.103194995

Anonymous 11/15/24(Fri)10:46:07 No.103194995

File: kek.png (112 KB, 740x788)

112 KB PNG

>>103194945
Even a year ago already coping
>>103194986
>current gen of llms (3.1, qwen2.5) don't quantize well at all even at int8?
cope for bad models, you know people will run quants, if your model doesn't handle being quanted it's dogshit, simple as

Anonymous
11/15/24(Fri)10:47:01 No.103195005

Anonymous 11/15/24(Fri)10:47:01 No.103195005

>>103194995
>if your model doesn't handle being quanted it's dogshit, simple as
amen

Anonymous
11/15/24(Fri)10:47:19 No.103195008

Anonymous 11/15/24(Fri)10:47:19 No.103195008

I believe AGI is possible in 24GB of VRAM.

Anonymous
11/15/24(Fri)10:48:06 No.103195012

Anonymous 11/15/24(Fri)10:48:06 No.103195012

>>103195008
Qwen-2.5-72b-coder-Bitnet, boom, AGI in 24 GB of VRAM

Anonymous
11/15/24(Fri)10:49:11 No.103195023

Anonymous 11/15/24(Fri)10:49:11 No.103195023

>>103194986
>wasn't it common knowledge at this point that the current gen of llms (3.1, qwen2.5) don't quantize well at all even at int8?
Cope. Largestral has no problems with quanting.

Anonymous
11/15/24(Fri)10:50:02 No.103195031

Anonymous 11/15/24(Fri)10:50:02 No.103195031

>>103195012
24GB should be able to push 100GB with BitNet.

Anonymous
11/15/24(Fri)10:51:03 No.103195040

Anonymous 11/15/24(Fri)10:51:03 No.103195040

>>103195031
100B* woops

Anonymous
11/15/24(Fri)10:51:26 No.103195043

Anonymous 11/15/24(Fri)10:51:26 No.103195043

>>103195031
I did the calculus and it was 91b, if you don't count the vram required for inference though, that's why 72b is a good spot, it leaves some room for the context shit

Anonymous
11/15/24(Fri)10:53:38 No.103195060

Anonymous 11/15/24(Fri)10:53:38 No.103195060

Controversial opinion but I firmly believe that we need better local models

Anonymous
11/15/24(Fri)10:54:47 No.103195070

Anonymous 11/15/24(Fri)10:54:47 No.103195070

>>103195060
kek

Anonymous
11/15/24(Fri)10:59:33 No.103195114

Anonymous 11/15/24(Fri)10:59:33 No.103195114

What's currently the best model for noise detection? I have long audio files(1h+) with a lot of background noise that gets transcribed by whisper as speech and I want to cut it out automatically.

Anonymous
11/15/24(Fri)11:22:00 No.103195274

Anonymous 11/15/24(Fri)11:22:00 No.103195274

File: 1718286702402876.png (457 KB, 1710x822)

457 KB PNG

https://xcancel.com/akyurekekin/status/1855680785715478546#m
>Just take a few gradients during test-time — a simple way to increase test time compute — and get a SoTA in ARC public validation set 61%=avg. human score!
holy shit

Anonymous
11/15/24(Fri)11:22:02 No.103195275

Anonymous 11/15/24(Fri)11:22:02 No.103195275

Are there any tangible and notable advancements in the last few months?

Anonymous
11/15/24(Fri)11:24:31 No.103195287

Anonymous 11/15/24(Fri)11:24:31 No.103195287

>>103195114
I assume you already tried old-school tools like audacity for that. gpt-sovits has a noise removal module (used during training, maybe you can isolate it), but i don't know if it can deal with such long files.

Anonymous
11/15/24(Fri)11:26:36 No.103195306

Anonymous 11/15/24(Fri)11:26:36 No.103195306

>>103195274
The final piece of the mosaic to achieve AGI

Anonymous
11/15/24(Fri)11:28:23 No.103195320

Anonymous 11/15/24(Fri)11:28:23 No.103195320

>>103195275
you don't need anything but 'transformers and make it bigger and train it longer'

Anonymous
11/15/24(Fri)11:28:30 No.103195323

Anonymous 11/15/24(Fri)11:28:30 No.103195323

>>103195274
>another reflection/entropy tier grift
it's all so tiresome...

Anonymous
11/15/24(Fri)11:34:11 No.103195356

Anonymous 11/15/24(Fri)11:34:11 No.103195356

>>103195323
At least they have *some* numbers to show. Entropix only managed to show a 1B that can count Rs.

Anonymous
11/15/24(Fri)11:50:12 No.103195492

Anonymous 11/15/24(Fri)11:50:12 No.103195492

>>103195031
>BitNet
If it's as revolutionary as people claim, why has it been over a year and there's no GPU implementation, no real usable bitnet models?

Anonymous
11/15/24(Fri)11:51:05 No.103195500

Anonymous 11/15/24(Fri)11:51:05 No.103195500

I'm pretty sure it's a goal of some people in this thread to get others to fill up their hard drives with pointless bullshit that isn't actually better than what everyone already has.

Anonymous
11/15/24(Fri)11:51:36 No.103195507

Anonymous 11/15/24(Fri)11:51:36 No.103195507

>>103195492
>over a year and there's no GPU implementation
To run a 4B model? It works the other way around.

Anonymous
11/15/24(Fri)11:52:14 No.103195516

Anonymous 11/15/24(Fri)11:52:14 No.103195516

>>103195492
>>103165113

Anonymous
11/15/24(Fri)11:53:30 No.103195526

Anonymous 11/15/24(Fri)11:53:30 No.103195526

>>103195492
It's actually baffling

Anonymous
11/15/24(Fri)11:56:31 No.103195555

Anonymous 11/15/24(Fri)11:56:31 No.103195555

>>103195516
Microsoft, Meta, and Apple do not run a charity for Nvidia's benefit and would happily ditch them at a moment's notice if something better came along.
The *only* reason Nvidia has so much leverage to begin with is that they're the only option.

No, the most likely reason is that BitNet isn't as good as people claim, or that it seems too risky an investment when they already have architectures which they know perform well "enough".

I would guess there will be more interest in alternative tech of this type once they realize they've exhausted what they have.

Anonymous
11/15/24(Fri)11:58:17 No.103195575

Anonymous 11/15/24(Fri)11:58:17 No.103195575

>>103195555
Yeah let's compress the same Internet 100 times in different resolutions that will be obsolete in a week bro. That's definitely better use of resources than trying out novel research ideas

Anonymous
11/15/24(Fri)11:59:15 No.103195580

Anonymous 11/15/24(Fri)11:59:15 No.103195580

>>103195114
Probably silero vad

Anonymous
11/15/24(Fri)12:01:35 No.103195594

Anonymous 11/15/24(Fri)12:01:35 No.103195594

>bitnet
china is cooking, qwen3 will be bitnet, trust the plan, 2 more weeks, etc

Anonymous
11/15/24(Fri)12:05:14 No.103195622

Anonymous 11/15/24(Fri)12:05:14 No.103195622

>>103195594
qwen team acknowledged bitnet when it came out, been radio silence since

Anonymous
11/15/24(Fri)12:08:03 No.103195641

Anonymous 11/15/24(Fri)12:08:03 No.103195641

https://www.techpowerup.com/328837/gigabyte-launches-amd-radeon-pro-w7800-ai-top-48g-graphics-card
>GIGABYTE Launches AMD Radeon PRO W7800 AI TOP 48G Graphics Card
interesting

Anonymous
11/15/24(Fri)12:13:28 No.103195694

Anonymous 11/15/24(Fri)12:13:28 No.103195694

>>103195594
qwen3 will be omni not bitnet

Anonymous
11/15/24(Fri)12:15:48 No.103195710

Anonymous 11/15/24(Fri)12:15:48 No.103195710

>>103195641
>Radeon
lol

Anonymous
11/15/24(Fri)12:21:37 No.103195748

Anonymous 11/15/24(Fri)12:21:37 No.103195748

>>103195492
I don't get it, every single company doesn't want to touch BitNet with a 15 foot pole even though the first one who manages to make a decent model out of it will be remembered in history forever

Anonymous
11/15/24(Fri)12:24:15 No.103195775

Anonymous 11/15/24(Fri)12:24:15 No.103195775

File: download.png (2 KB, 300x80)

2 KB PNG

>>103195748
>>103165386

Anonymous
11/15/24(Fri)12:25:21 No.103195781

Anonymous 11/15/24(Fri)12:25:21 No.103195781

>>103195748
Almost as if its a dead end.

Anonymous
11/15/24(Fri)12:25:41 No.103195785

Anonymous 11/15/24(Fri)12:25:41 No.103195785

>>103195775
yeah and? why would companies give a fuck about Nvdia, the majority of them hate buying for overpriced GPUs, that's a win win for them

Anonymous
11/15/24(Fri)12:26:44 No.103195795

Anonymous 11/15/24(Fri)12:26:44 No.103195795

>>103195641
>price nowhere to be found
Who cares?
You can already buy GPUs with 48 GB VRAM or even more, the problem is that they're too expensive.

Anonymous
11/15/24(Fri)12:30:16 No.103195822

Anonymous 11/15/24(Fri)12:30:16 No.103195822

>>103195785
Why would they even risk training the BitNet model? Nvidia is known for putting companies back in line when they try to negotiate with AMD

Anonymous
11/15/24(Fri)12:31:43 No.103195836

Anonymous 11/15/24(Fri)12:31:43 No.103195836

>>103195822
>Why would they even risk training the BitNet model?
Meta has enough cards to not having to deal with Nvdia ever again lol

Anonymous
11/15/24(Fri)12:32:51 No.103195841

Anonymous 11/15/24(Fri)12:32:51 No.103195841

>>103195795
Companies will buy those cards, eat shit with software support, and dump them at junk prices on eBay, just like what happened with Mi60

Anonymous
11/15/24(Fri)12:34:03 No.103195854

Anonymous 11/15/24(Fri)12:34:03 No.103195854

>>103195822
https://github.com/microsoft/BitNet
Microsoft literally developed this. When is Nvidia going to stop selling them cards?

Anonymous
11/15/24(Fri)12:34:27 No.103195861

Anonymous 11/15/24(Fri)12:34:27 No.103195861

Livebench JUST added Qwen 2.5 7B. What's taking them so long to benchmark the entire Qwen series?

Anonymous
11/15/24(Fri)12:34:59 No.103195865

Anonymous 11/15/24(Fri)12:34:59 No.103195865

>>103195836
Meta is going to buy a bunch more GPUs in 2025
>In its Q2 earnings release, Meta also commented that its infrastructure cost expense will significantly rise in 2025. This is clearly tied to its computing power build-out to create the best AI model it can. Nvidia will be a primary beneficiary of this, making it an intriguing stock for 2025.

Anonymous
11/15/24(Fri)12:35:29 No.103195868

Anonymous 11/15/24(Fri)12:35:29 No.103195868

>>103195854
NTA but wake me up when they give us a usable model. Code is cheap.

Anonymous
11/15/24(Fri)12:36:21 No.103195877

Anonymous 11/15/24(Fri)12:36:21 No.103195877

>>103195854
When they release a capable 70b BitNet model

Anonymous
11/15/24(Fri)12:36:53 No.103195878

Anonymous 11/15/24(Fri)12:36:53 No.103195878

>>103189328
cancer

Anonymous
11/15/24(Fri)12:44:21 No.103195938

Anonymous 11/15/24(Fri)12:44:21 No.103195938

>>103195748
people don't want to hear it but this is probably more likely because they've tried it and determined it's not really worth it rather than nvidia conspiracies or hating local users

Anonymous
11/15/24(Fri)12:45:01 No.103195942

Anonymous 11/15/24(Fri)12:45:01 No.103195942

>>103195822
>Nvidia is known for putting companies back in line when they try to negotiate with AMD
that's the point, if they manage to make BitNet work, they won't have to rely on Nvdia's bully tactics anymore, there will be more competition, it'll be the true boom of AI

Anonymous
11/15/24(Fri)12:49:48 No.103195976

Anonymous 11/15/24(Fri)12:49:48 No.103195976

verdict on cogstudio?
Is it worth going through another BS install to use if you've got a 24gb GPU?

Anonymous
11/15/24(Fri)12:53:07 No.103196008

Anonymous 11/15/24(Fri)12:53:07 No.103196008

>>103189328
>Sarashina2-8x70B, a Japan-trained LLM model: https://hf.co/sbintuitions/sarashina2-8x70b
>Hunyuan-Large released with 389B
What's the incentive of making that? It's clear no one would be able to run it except corpos, and corpos have no incentive to run that because Llama3 is better. It's a literal waste of computing power.

Anonymous
11/15/24(Fri)12:54:08 No.103196017

Anonymous 11/15/24(Fri)12:54:08 No.103196017

what's the best thing I can fit into a 24gb gpu?

Anonymous
11/15/24(Fri)12:55:52 No.103196028

Anonymous 11/15/24(Fri)12:55:52 No.103196028

>>103196008
>no one would be able to run it
with cpu offload, giant MoE models are actually a really good deal if you have the memory to load them.
cpumaxxers tend to run them

Anonymous
11/15/24(Fri)12:59:21 No.103196070

Anonymous 11/15/24(Fri)12:59:21 No.103196070

>>103196028
This. Just a cheaper way to run huge param models since you can get decent speeds on cpu.

Anonymous
11/15/24(Fri)13:00:31 No.103196087

Anonymous 11/15/24(Fri)13:00:31 No.103196087

>>103195942
How long will it take to design and produce the hardware?

Anonymous
11/15/24(Fri)13:01:11 No.103196094

Anonymous 11/15/24(Fri)13:01:11 No.103196094

>>103196028
>>103196070

That's good except CPU runs 70B models at 2 t/s at most. 8x7B? Fast. 8x13B? Good quality for performance. 8x20B? Already a bit too slow. 8x70B? Unfeasible, especially if it runs 2 experts so effectively is a 140B.

Anonymous
11/15/24(Fri)13:03:28 No.103196119

Anonymous 11/15/24(Fri)13:03:28 No.103196119

File: 1607026237334.gif (1.49 MB, 400x560)

1.49 MB GIF

>>103196008
>Sarashina2-8x70B
I'm a little late to the party. Anywhere to test drive this before I try and run it locally?

Anonymous
11/15/24(Fri)13:06:22 No.103196150

Anonymous 11/15/24(Fri)13:06:22 No.103196150

>>103196094
2t/s is fine for one user, speeds above say 10t/s is about as fast as we read anyway. Not usable for servers with multiple users of course.

Anonymous
11/15/24(Fri)13:06:34 No.103196151

Anonymous 11/15/24(Fri)13:06:34 No.103196151

>>103196094
>That's good except CPU runs 70B models at 2 t/s at most.
That's definitely a (you) problem

Anonymous
11/15/24(Fri)13:09:20 No.103196174

Anonymous 11/15/24(Fri)13:09:20 No.103196174

>>103196094
How one can enjoy 2t/s

Anonymous
11/15/24(Fri)13:10:37 No.103196189

Anonymous 11/15/24(Fri)13:10:37 No.103196189

The Japanese output of ezo 72b is so good once you redpill it a bit (its mildly pozzed on a null sampler and empty prompt)
Would it be possible to output jap and have a smaller translation model turn it into less sloppy english? maybe that's more doable from another language that maps to english a bit better?

Anonymous
11/15/24(Fri)13:10:41 No.103196192

Anonymous 11/15/24(Fri)13:10:41 No.103196192

>>103196151
Sure, if (I) am the laws of physics

Anonymous
11/15/24(Fri)13:10:49 No.103196195

Anonymous 11/15/24(Fri)13:10:49 No.103196195

>>103195748
>every single company doesn't want to touch BitNet with a 15 foot pole even though the first one who manages to make a decent model out of it will be remembered in history forever
Bitnet things.
- Massive training cost, low inference cost
You'll get a PRODUCT on the MARKET faster and cheaper by doing old shitnet instead of new bitnet.
- Pointy Haired Boss concerns
How much more expense is there to make it Safe™ and Aligned™?
How expensive will it be to memory hole double plus ungood truths that The Party might discover after release?
- Is it even useful?
We're still in an era where there has been no true Killer App in AI.
You can do some fun art things but it's already politicized between copyright lawyers thirsting for moneyblood and Luddites throwing their wooden shoes at fun proopters.
Text is starting to get okay for translation but we've had machine translation for a long time using more easily controlled methods. (LLMs just kinda do what they want, hallucinating or mixing up context at will.) Unless it can start doing really impressive and useful things, like telling you that your dog wants steak or lets you ask your cat to stop scratching your sofa when there's a scratching post right there next to it, it's not a game changer.
Voice has been mostly getting shade since for every funny music parody we have a hundred scammers trying to turn a dime with a voice clone. Conversely, it's really nice at transcribing audio to text but that's an accessibility feature that's already normalized since Google/YouTube's been on top of it for a while.
Music, like image art, is even more about lawyers and arguments about taste.

Till there's a sure place in the market that will pay the heightened training costs, there isn't much reason for a big company to invest in 1.58 when there is steady, easy, risk free iteration on shitnet models. As long as they can keep posting slightly better than the other guy benchmark numbers, they shall continue to do so.

Anonymous
11/15/24(Fri)13:13:13 No.103196216

Anonymous 11/15/24(Fri)13:13:13 No.103196216

>>103196150
>>103196174
2 t/s is literally unbearable. 3 t/s is suffering. 4 t/s is difficult but tolerable. 5 t/s is okay. 6 t/s and more is good. The suffering grows exponentially as you approach 0 t/s, it's not linear.

Anonymous
11/15/24(Fri)13:13:22 No.103196218

Anonymous 11/15/24(Fri)13:13:22 No.103196218

>>103196192
>laws of poverty
with the right cpumaxxing setup you can get 8t/s+ on 70b

Anonymous
11/15/24(Fri)13:14:21 No.103196227

Anonymous 11/15/24(Fri)13:14:21 No.103196227

File: image.png (126 KB, 796x1272)

126 KB PNG

What did they mean by this?

Anonymous
11/15/24(Fri)13:14:28 No.103196230

Anonymous 11/15/24(Fri)13:14:28 No.103196230

>>103196017
Magnum 27B

Anonymous
11/15/24(Fri)13:15:06 No.103196236

Anonymous 11/15/24(Fri)13:15:06 No.103196236

>>103196218
Yeah sure, how do you get RAM bandwidth above 7000 MT/s when it's (pretty much) the very top option available to general consumer?

Anonymous
11/15/24(Fri)13:15:29 No.103196245

Anonymous 11/15/24(Fri)13:15:29 No.103196245

>>103196195
>How expensive will it be to memory hole double plus ungood truths that The Party might discover after release?
That's an easy one, just filter the training data.

Anonymous
11/15/24(Fri)13:16:36 No.103196259

Anonymous 11/15/24(Fri)13:16:36 No.103196259

>>103196119
i'm pretty confident it's a nothingburger, they probably artificially inflated it to make it look better to investors

Anonymous
11/15/24(Fri)13:17:50 No.103196276

Anonymous 11/15/24(Fri)13:17:50 No.103196276

>>103196230
hm, I think I heard good stuff about this one. is there anything smaller if I want more context?

Anonymous
11/15/24(Fri)13:18:31 No.103196284

Anonymous 11/15/24(Fri)13:18:31 No.103196284

File: ComfyUI_temp_mhdoa_00020_.png (2.23 MB, 992x1240)

2.23 MB PNG

>>103196216
This man knows of what he speaks. And just to add, if you want to set up something with TTS that isn't absolute dogshit, 8 t/s is the absolute minimum.

Anonymous
11/15/24(Fri)13:18:49 No.103196286

Anonymous 11/15/24(Fri)13:18:49 No.103196286

speaking of livebench, i noticed that qwen2.5 7b is better than the latest 3.5t, and qwen2.5 72b is better than the latest 4t

local hasn't won yet but we are doing fine i'd say

Anonymous
11/15/24(Fri)13:20:18 No.103196303

Anonymous 11/15/24(Fri)13:20:18 No.103196303

>>103196195
>How much more expense is there to make it Safe™ and Aligned™?
>How expensive will it be to memory hole double plus ungood truths that The Party might discover after release?
A non-issue for actual decent human beings, racist chuds stay seething <3

Anonymous
11/15/24(Fri)13:20:19 No.103196304

Anonymous 11/15/24(Fri)13:20:19 No.103196304

>>103196286
nah there always will be a 'lag' for local but that's an acceptable tradeoff as long as there's still progress

Anonymous
11/15/24(Fri)13:20:24 No.103196305

Anonymous 11/15/24(Fri)13:20:24 No.103196305

>>103196236
skill issue. consume better

Anonymous
11/15/24(Fri)13:20:31 No.103196307

Anonymous 11/15/24(Fri)13:20:31 No.103196307

>>103196245
Your reading comprehension is lacking.
>after release
Emmanuel Goldstein's agents are always working to subvert The Party. So even though you've filtered the training data, the Brotherhood will have nonetheless found ways to cause Wrongthink to emerge and we must be ready to respond by correcting the model. If the model can't be corrected, then we will need to increase the chocolate ration from 4g to 3g.

Anonymous
11/15/24(Fri)13:21:04 No.103196316

Anonymous 11/15/24(Fri)13:21:04 No.103196316

>>103196286
Now if only we could also get a model that trades blows with Claude at creative tasks.

Anonymous
11/15/24(Fri)13:21:11 No.103196317

Anonymous 11/15/24(Fri)13:21:11 No.103196317

>>103196286
Qwen-2.5-Coder-72B wen?

Anonymous
11/15/24(Fri)13:24:34 No.103196348

Anonymous 11/15/24(Fri)13:24:34 No.103196348

>>103196286
It's great but Livebench is a bit biased towards coding and academic type knowledge. We need a niche knowledge benchmark.

Anonymous
11/15/24(Fri)13:24:46 No.103196351

Anonymous 11/15/24(Fri)13:24:46 No.103196351

>>103196304
the problem isn't the lag, is the supposed "moat" (which is a meme as we can see).

if we get the same thing corpos get but after some time and it runs on consumer hardware, then it's a win for us

Anonymous
11/15/24(Fri)13:25:44 No.103196365

Anonymous 11/15/24(Fri)13:25:44 No.103196365

File: image7446[1].png (1.23 MB, 930x1172)

1.23 MB PNG

>>103196236
Git gud at overclocking

Anonymous
11/15/24(Fri)13:26:45 No.103196372

Anonymous 11/15/24(Fri)13:26:45 No.103196372

>>103196317
I don't know but I'm hopeful that it might not suck.
I threw my usual cursory programming questions at 32B at Q8 and it flopped on shit that Llamas handle well. Python 101 was passable but complicated Python and Java refactoring were both a bust. I was expecting fire and I got a fizzle.

Anonymous
11/15/24(Fri)13:27:08 No.103196375

Anonymous 11/15/24(Fri)13:27:08 No.103196375

>>103196348
>We need a niche knowledge benchmark.
imo it's retarded fitting niche knowledge inside models, i think people will understand this in the long run. stuff like rag, infinite context, ttt, etc... can all solve the niche knowledge "issue" while keeping the actual "reasoning" core small

in 5 years it's gonna be laughable how ancient the current tech was

Anonymous
11/15/24(Fri)13:28:08 No.103196386

Anonymous 11/15/24(Fri)13:28:08 No.103196386

>>103196365
>oc championship
speedtranners look sane in comparison

Anonymous
11/15/24(Fri)13:28:23 No.103196388

Anonymous 11/15/24(Fri)13:28:23 No.103196388

>>103196375
>in 5 years it's gonna be laughable how ancient the current tech was
We won't mind as long as we're laughing at it alongside our ai-wives.

Anonymous
11/15/24(Fri)13:30:01 No.103196404

Anonymous 11/15/24(Fri)13:30:01 No.103196404

>>103196372
Why are redditors praising so much then?

Anonymous
11/15/24(Fri)13:30:14 No.103196405

Anonymous 11/15/24(Fri)13:30:14 No.103196405

>>103196375
We'll have ASI in 5 years

Anonymous
11/15/24(Fri)13:30:48 No.103196408

Anonymous 11/15/24(Fri)13:30:48 No.103196408

>>103196227
Elon: I'm cutting the funding until I'm certain you will stay nonprofit. If I keep funding you, you'll scam me.
Sam: suuurrreeee...

Anonymous
11/15/24(Fri)13:30:57 No.103196409

Anonymous 11/15/24(Fri)13:30:57 No.103196409

>Adaptive Decoding via Latent Preference Optimization
https://arxiv.org/abs/2411.09661
> During language model decoding, it is known that using higher temperature sampling gives more creative responses, while lower temperatures are more factually accurate. However, such models are commonly applied to general instruction following, which involves both creative and fact seeking tasks, using a single fixed temperature across all examples and tokens. In this work, we introduce Adaptive Decoding, a layer added to the model to select the sampling temperature dynamically at inference time, at either the token or example level, in order to optimize performance. To learn its parameters we introduce Latent Preference Optimization (LPO) a general approach to train discrete latent variables such as choices of temperature. Our method outperforms all fixed decoding temperatures across a range of tasks that require different temperatures, including UltraFeedback, Creative Story Writing, and GSM8K.
I don't remember if papers anon posted this.

Anonymous
11/15/24(Fri)13:32:21 No.103196422

Anonymous 11/15/24(Fri)13:32:21 No.103196422

>>103196303
>actual decent human beings
That's a weird way to say niggercattle.

Anonymous
11/15/24(Fri)13:32:40 No.103196426

Anonymous 11/15/24(Fri)13:32:40 No.103196426

>>103196375
>stuff like rag etc... can all solve the niche knowledge "issue"
fuck no they can't once the model hits the rag database to see some niche info it more than likely already said something retarded in the last message, refute this

I want the model to know and be able to reference say for example what mesugaki is without me having to tell it, how do you do that with rag?

Anonymous
11/15/24(Fri)13:33:54 No.103196441

Anonymous 11/15/24(Fri)13:33:54 No.103196441

>>103196386
Yeah, OCing at the level where you need LN2 is gay, but there are more reasonable setups to get DDR5 to 9,000 MT/s and beyond.
>>103196303
Shitty bait

Anonymous
11/15/24(Fri)13:35:00 No.103196451

Anonymous 11/15/24(Fri)13:35:00 No.103196451

>>103196404
>Why are redditors praising so much then?

Anonymous
11/15/24(Fri)13:36:19 No.103196456

Anonymous 11/15/24(Fri)13:36:19 No.103196456

>>103196236
>general consumer
cpumaxxer with 12 channel DDR5 is anything but general consumer

Anonymous
11/15/24(Fri)13:36:20 No.103196458

Anonymous 11/15/24(Fri)13:36:20 No.103196458

>>103196451
>>Why are redditors praising so much then?

Anonymous
11/15/24(Fri)13:36:32 No.103196459

Anonymous 11/15/24(Fri)13:36:32 No.103196459

>>103196409
nothingburger, we need new paradigm asap

>>103196426
>once the model hits the rag database to see some niche info it more than likely already said something retarded in the last message, refute this
unironically skill issue. also
>shitty rag we have now will never improve

>mesugaki is without me having to tell it, how do you do that with rag?
when coomers finetune models on mesugakis you are implicitely "telling" the model that you want it to focus on cooming. adding this information in promt/rag db/whatever is exactly the same thing in practice

Anonymous
11/15/24(Fri)13:36:45 No.103196463

Anonymous 11/15/24(Fri)13:36:45 No.103196463

File: purpleguy.jpg (31 KB, 602x357)

31 KB JPG

>>103196365
cool, it's still 3 t/s

Anonymous
11/15/24(Fri)13:37:33 No.103196468

Anonymous 11/15/24(Fri)13:37:33 No.103196468

>>103196375
No, it's already been shown multiple times that models perform better at tasks involving some knowledge when it saw it during training compared to when it is given the information in context. If you think about it, it makes sense why.

Anonymous
11/15/24(Fri)13:38:56 No.103196477

Anonymous 11/15/24(Fri)13:38:56 No.103196477

>>103196404
Perhaps because they saw big numbers on the benchmarks?
Do they even make software or just talk about it?
My Java refactor test is drawn from an actual issue I had. It's not very complicated, but I had it as a copy paste edit job because it was a lot easier to just do that than solve the actual problem, which involves Java's most notorious warts, the primitive versus boxed bullshit and the arrays aren't containers bullshit.

L3.1-Nemotron mildly impressed me, because while it's a bit too chatty on simple questions it detected, explained, and worked around those issues before outputting any code. A few other L3's got it right, but most would need to be told that they got those details wrong before then issuing a reasonable fix.

Hopefully if I get off of my ass and get back on my projects I'll have a more comprehensive collection of code problem tests.
Maybe I'll make a program for that. It isn't too hard to automate sending shit to and from LlamaCPP (or a running Kobold instance) right?

Anonymous
11/15/24(Fri)13:39:20 No.103196479

Anonymous 11/15/24(Fri)13:39:20 No.103196479

>>103196468
current models using current training methods, yes. this doesn't change the main point: niche info should stay outside, we need better/faster/smaller "reasoning" cores

Anonymous
11/15/24(Fri)13:39:51 No.103196483

Anonymous 11/15/24(Fri)13:39:51 No.103196483

>>103196456
>12 channel DDR5
I'm not sure if that's even technologically possible ATM. That will certainly require 2 CPUs.

Anonymous
11/15/24(Fri)13:40:25 No.103196491

Anonymous 11/15/24(Fri)13:40:25 No.103196491

>>103196459
>>103196479

>adding this information in promt/rag db/whatever is exactly the same thing in practice
of course having to give the model an entire fucking dictionary with usage definition eating up context is the same as the model knowing when to use something naturally

if they manage to filter unsafe data properly, you'll use rag to teach models what a cock is I guess? Or waste context on explaining full human anatomy?

Anonymous
11/15/24(Fri)13:41:39 No.103196510

Anonymous 11/15/24(Fri)13:41:39 No.103196510

>>103196459
>new paradigm
Boltzmann brain in a jar. Human-level general reasoning, spatial understanding, etc. (Effectively) infinite context.

Anonymous
11/15/24(Fri)13:41:48 No.103196512

Anonymous 11/15/24(Fri)13:41:48 No.103196512

What advancement or feature do (you) predict for LLM's in the year 2025?

Anonymous
11/15/24(Fri)13:43:08 No.103196525

Anonymous 11/15/24(Fri)13:43:08 No.103196525

>>103196483
You're on the right track

Anonymous
11/15/24(Fri)13:44:23 No.103196538

Anonymous 11/15/24(Fri)13:44:23 No.103196538

>>103196491
you are still missing the point, re-read the whole reply-chain

it's retarded having a model that know EVERY single niche thing, what we need are SMART models that can "learn" niche things that you want (again, wheether using a not retarded rag, infinite context, ttt, etc...)

current llm are unironically hitting the ceiling in terms of reasoning, no matter how much they'll pushing o1 and "inference time compute" meme.

Anonymous
11/15/24(Fri)13:44:34 No.103196540

Anonymous 11/15/24(Fri)13:44:34 No.103196540

>>103196483
https://rentry.org/miqumaxx

Anonymous
11/15/24(Fri)13:45:54 No.103196556

Anonymous 11/15/24(Fri)13:45:54 No.103196556

>>103196538
>it's retarded having a model that know EVERY single niche thing
Yet Claude is the best and it's obvious they don't actually filter shit with some of the stuff it knows

Anonymous
11/15/24(Fri)13:46:03 No.103196560

Anonymous 11/15/24(Fri)13:46:03 No.103196560

>be homo
>Fall for pre op mtf is just like a boyfriend meme
>Have a lovely date
>Next day they find out my hobby is AI
>Total troon rage, they burn the entire mother fucking friendship to the ground.
Ahh ahh mistress...

Anonymous
11/15/24(Fri)13:47:28 No.103196578

Anonymous 11/15/24(Fri)13:47:28 No.103196578

File: screenshot.png (5 KB, 1480x80)

5 KB PNG

>>103194231
>textsyn
>>103196189
>a smaller translation model turn it into less sloppy english?
just tried the pair, piping ezo 72b ero output to madlad400 7b translation model in textsyn
breddy gud considering the complete lack of effort required.
Shame that txtsyn is binary only but it sure does fast, uncensored translation!

Anonymous
11/15/24(Fri)13:47:30 No.103196579

Anonymous 11/15/24(Fri)13:47:30 No.103196579

>>103196512
BitNet implemented, next year for sure. And maybe a proper CoT model from one of the big players>>103196510
. Nothing else other than that. Transformer is walling. Disregard breakthrough claims that involve
>samplers
>synthetic data including self-play or anything that makes an LLM rate your answers

Anonymous
11/15/24(Fri)13:47:47 No.103196581

Anonymous 11/15/24(Fri)13:47:47 No.103196581

>>103196538
And obviously I'm talking about stuff that's possible now, not le magic bitnet just 2mw memes, like your ideal learning models

Anonymous
11/15/24(Fri)13:49:06 No.103196598

Anonymous 11/15/24(Fri)13:49:06 No.103196598

>>103196556
claude is the best because it' probably 1t+ parameters and it uses the whole internet as dataset

as i've said, it's retarded. not said it doesn't work

Anonymous
11/15/24(Fri)13:49:27 No.103196602

Anonymous 11/15/24(Fri)13:49:27 No.103196602

>>103196540
I don't think we even have a modern cpumaxxer here any more, since the Turin release.
Or did someone lurking here drop big cash to upgrade and hasn't said anything?

Anonymous
11/15/24(Fri)13:50:21 No.103196611

Anonymous 11/15/24(Fri)13:50:21 No.103196611

>>103196479
What you are suggesting is equivalent to magical thinking. There is no world where a model can suddenly be good at something it has limited time to learn, whether using infinite context methods, test time compute, etc, because those use, as they imply, more compute, something consumers already have in limited quantity. You are essentially moving the "training" to the edge device (your computer) instead of the incredibly efficient super computers used to train these models. What COULD happen is a sparse architecture, like a MoE, where you can pick and choose the knowledge you want it to have. But that still requires the total parameter size at pretrain time to be the same (big).

Anonymous
11/15/24(Fri)13:50:32 No.103196615

Anonymous 11/15/24(Fri)13:50:32 No.103196615

>>103196540
I mean, it's cool, but...
>$6k USD
...that's in no way a reasonable amount of money to spend on a PC. Unless you rent it out as a server, mine on it or smth else.
>inb4 poorfag
No, I can technically afford that, but it just feels wrong. 6k$ on what is effectively just entertainment? Come on.

Anonymous
11/15/24(Fri)13:52:06 No.103196633

Anonymous 11/15/24(Fri)13:52:06 No.103196633

>>103196303
Based.

Anonymous
11/15/24(Fri)13:52:17 No.103196635

Anonymous 11/15/24(Fri)13:52:17 No.103196635

>>103196598
>and it uses the whole internet as dataset
if you're implying it does searches for each prompt it receives I doubt it, and if you're talking pretrain data, then yeah that's what our models could do as well if not for the retarded filtering, we need more data, not less, unless you consider phi to be the best thing ever.

Anonymous
11/15/24(Fri)13:52:21 No.103196636

Anonymous 11/15/24(Fri)13:52:21 No.103196636

>>103196459
>nothingburger, we need new paradigm asap
this anon fucks
at some point someone has to call out this waste of GPU time on training. When you can oneshot classify for "creative" and apply a multiplier to get your temperature, what did you achieve with your dumbass layer. Could have been an explainable layer instead of another black box bullshit layer.

Anonymous
11/15/24(Fri)13:52:22 No.103196637

Anonymous 11/15/24(Fri)13:52:22 No.103196637

>>103196538
Isn't o1 literally "make 10 responses, then make the AI pick the best one"?

Anonymous
11/15/24(Fri)13:52:35 No.103196639

Anonymous 11/15/24(Fri)13:52:35 No.103196639

>>103196615
>laws of poverty

Anonymous
11/15/24(Fri)13:53:19 No.103196648

Anonymous 11/15/24(Fri)13:53:19 No.103196648

>>103196615
You can buy a specced out M4 to run llms three times faster with 6k

Anonymous
11/15/24(Fri)13:53:23 No.103196650

Anonymous 11/15/24(Fri)13:53:23 No.103196650

>>103196639
>didn't read
I'm an adult, I'm not going to spend 6 grand on a toy.

Anonymous
11/15/24(Fri)13:53:25 No.103196652

Anonymous 11/15/24(Fri)13:53:25 No.103196652

>>103196639
>>laws of poverty

Anonymous
11/15/24(Fri)13:53:55 No.103196655

Anonymous 11/15/24(Fri)13:53:55 No.103196655

>>103196637
no goyim it's basically AGI just wait two more weeks until they release the full model and you'll see

Anonymous
11/15/24(Fri)13:54:28 No.103196663

Anonymous 11/15/24(Fri)13:54:28 No.103196663

>>103196648
Good point. I think it would also have a TB of RAM? Guess Steve Jobs kinda won here.

Anonymous
11/15/24(Fri)13:54:39 No.103196666

Anonymous 11/15/24(Fri)13:54:39 No.103196666

>>103196611
you are thinking in terms of current shitty tech, i'm talking about upcoming new paradigms

Anonymous
11/15/24(Fri)13:54:53 No.103196668

Anonymous 11/15/24(Fri)13:54:53 No.103196668

I'm trying this as a character note (depth 0 system role) with Nemotron 70B and I've yet to see whether it works at all or not. It's for an RP with a lot of characters, but even with only 4 having come up they all quickly started talking the same.
[Remember to maintain each character's characterization and manner of speech based on any notes about them from the prompt and from their appearances in the game so far.]
Anyone already using something like this with success?

Anonymous
11/15/24(Fri)13:56:15 No.103196687

Anonymous 11/15/24(Fri)13:56:15 No.103196687

>>103196648
How many macs you need to run the 389B model from the beginning of this conversation?

Anonymous
11/15/24(Fri)13:56:47 No.103196692

Anonymous 11/15/24(Fri)13:56:47 No.103196692

>>103196666
so unironic 2mw like bitnet then
https://huggingface.co/1bitLLM/bitnet_b1_58-3B/tree/main
>8 months ago
meanwhile, our current models could be much better if they were never filtered in the first place

Anonymous
11/15/24(Fri)13:56:48 No.103196693

Anonymous 11/15/24(Fri)13:56:48 No.103196693

>>103196637
I believed it was
1. Use CoT, write a response
2. Write criticism of the most recent response and based on that criticism write a new one response (iterate several times).
3. Output last response and a fake CoT that isn't actually the true reasoning.

Anonymous
11/15/24(Fri)13:57:32 No.103196701

Anonymous 11/15/24(Fri)13:57:32 No.103196701

>>103196687
It's not even worth running. I managed to Nala test it over the sample page they had setup on HF

Anonymous
11/15/24(Fri)13:58:54 No.103196712

Anonymous 11/15/24(Fri)13:58:54 No.103196712

>>103196693
lmao and for a moment I legitimately thought they came up with something revolutionary

Anonymous
11/15/24(Fri)13:59:36 No.103196719

Anonymous 11/15/24(Fri)13:59:36 No.103196719

>>103196666
There are no upcoming paradigms that will be able to give you free performance gains on ICL/TTT. By definition, those require literally using more compute to process the new information. If you have a few H100's to make the processing time of those new paradigms tolerable, then good for you. That is irrelevant for everyone else here.

Anonymous
11/15/24(Fri)14:00:03 No.103196722

Anonymous 11/15/24(Fri)14:00:03 No.103196722

What's the best finetune of Smallstral? I hate their instruct format so I just need something that works with chatml/alpaca/etc.

Anonymous
11/15/24(Fri)14:00:36 No.103196728

Anonymous 11/15/24(Fri)14:00:36 No.103196728

>>103196650
>That's definitely a (you) problem
>Sure, if (I) am the laws of physics
>Here is a complete setup to achieve this
>I'm poor I can't afford!
So, it's still a (you) problem

Anonymous
11/15/24(Fri)14:01:37 No.103196736

Anonymous 11/15/24(Fri)14:01:37 No.103196736

>>103196701
That's not the point.

Anonymous
11/15/24(Fri)14:02:27 No.103196744

Anonymous 11/15/24(Fri)14:02:27 No.103196744

>>103196728
>I'm poor I can't afford!
I literally said I can. What's up with (you) and (reading)?

Anonymous
11/15/24(Fri)14:03:20 No.103196750

Anonymous 11/15/24(Fri)14:03:20 No.103196750

>>103196663
maybe, if they end up with more than 192gb like the m3 topped out at, and they also get a better way to process context.
I'd still love to see how a mac would perform when used as an RPC backend to a machine with a proper GPU.

Anonymous
11/15/24(Fri)14:04:20 No.103196757

Anonymous 11/15/24(Fri)14:04:20 No.103196757

>>103196744
>I literally said I can
nta, but if you can't buy it consequence free, then you can't really afford it.

Anonymous
11/15/24(Fri)14:05:52 No.103196768

Anonymous 11/15/24(Fri)14:05:52 No.103196768

>>103196757
This.
Poor people are very bad about confusing having some money versus being wealthy.

Anonymous
11/15/24(Fri)14:11:22 No.103196810

Anonymous 11/15/24(Fri)14:11:22 No.103196810

>>103196722
You will not find a good Mistral Small fine tune using a non-Mistral instruct format. The Mistral Small base model wasn't released, only the instruct-tuned model. No one competent would fine tune over a instruct tune using a different instruction format. The only reasons for doing that are being a script kiddie who is using someone else's pipeline and can't figure out how to change anything, or being an abject moron.

Anonymous
11/15/24(Fri)14:12:33 No.103196818

Anonymous 11/15/24(Fri)14:12:33 No.103196818

>>103196757
>>103196768
I mean, fair point, but I seriously doubt anyone in this thread would be able to spend 6k$ on a whim like it's some waiter tip.

Anonymous
11/15/24(Fri)14:13:31 No.103196825

Anonymous 11/15/24(Fri)14:13:31 No.103196825

>>103196701
I also ran it, and it just spat out a stream of nonsense. I couldn't get it to do decent completion. Possible skill issue though, I'm not used to using base models

Anonymous
11/15/24(Fri)14:13:39 No.103196827

Anonymous 11/15/24(Fri)14:13:39 No.103196827

>>103196810
>The Mistral Small base model wasn't released, only the instruct-tuned model.
Didn't notice. Yeah, that sucks. You are right.

Anonymous
11/15/24(Fri)14:15:10 No.103196845

Anonymous 11/15/24(Fri)14:15:10 No.103196845

File: 1702475860309760.jpg (1.55 MB, 1280x1760)

1.55 MB JPG

>>103196822
>>103196822
>>103196822
Next Thread

Anonymous
11/15/24(Fri)15:04:41 No.103197337

Anonymous 11/15/24(Fri)15:04:41 No.103197337

>>103196818
Not on a whim, but my 2 dedicated AI servers cost more because it's fucking cool to have stuff like this at home.

Anonymous
11/15/24(Fri)15:46:27 No.103197691

Anonymous 11/15/24(Fri)15:46:27 No.103197691

>>103196818
I spent $10k on my build.

Anonymous
11/15/24(Fri)16:05:58 No.103197839

Anonymous 11/15/24(Fri)16:05:58 No.103197839

>>103188780
>>103188780
>>103188780
real thread. Stay clear from the spam.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.