/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 10/17/24(Thu)14:20:58 No.102862101

File: miku_seine_alter_.png (2.63 MB, 1280x1280)

2.63 MB PNG

/lmg/ - Local Models General Anonymous 10/17/24(Thu)14:20:58 No.102862101 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102849995 & >>102838447

►News
>(10/16) Ministral 8B instruct model released: https://mistral.ai/news/ministraux/
>(10/15) PLaMo-100B: English and Japanese base model: https://hf.co/pfnet/plamo-100b
>(10/15) Llama-3.1-70B-Instruct customized by NVIDIA: https://hf.co/nvidia/Llama-3.1-Nemotron-70B-Instruct
>(10/14) Llama 3.1 linearized: https://hf.co/collections/hazyresearch/lolcats-670ca4341699355b61238c37
>(10/14) Zamba2-7B released: https://www.zyphra.com/post/zamba2-7b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
10/17/24(Thu)14:22:18 No.102862116

Anonymous 10/17/24(Thu)14:22:18 No.102862116

File: GYlCgrqasAAjKX-.jpg (20 KB, 462x370)

20 KB JPG

►Recent Highlights from the Previous Thread: >>102849995

--Papers:
>102855851
--Recommendations and considerations for running models on a 2060super 8gb setup:
>102851319 >102851349 >102851356 >102851375 >102851414 >102851704 >102851931 >102852032 >102852056 >102852084 >102851972 >102851483
--How to understand and implement samplers in AI models:
>102854191 >102854542 >102854634
--Nemotron 70b resists explicit story direction:
>102856315 >102856323 >102856345
--Mistral's performance on trivia questions:
>102852002 >102852061 >102852178 >102852203 >102852201 >102852232 >102852310 >102854908 >102852409 >102852312 >102854622 >102852481
--L3.1 Nemotron Instruct at Q6K shows promise but falls short in RP:
>102858009
--H100 worth the price, but depends on the task and setup:
>102852441 >102852670 >102852712 >102852744 >102852753 >102852964 >102853007 >102853223 >102853246
--Ministral-8B-Instruct Nala test shows improvement over Nemo:
>102850925 >102850945 >102850962 >102851092 >102851019 >102851030 >102851023
--Ministral ggufable but has issues at long context:
>102851380 >102851421 >102851455 >102851473 >102851488 >102851558 >102851565 >102851611 >102851713 >102851737 >102851828
--Debate on AI surpassing human-level intelligence and its capabilities in ERP and TTRPG:
>102850496 >102850808 >102852429 >102853638 >102853941 >102853974 >102853963 >102854001 >102854095 >102854133 >102854169 >102854215
--8B model performs well for RP purposes and holds up under quantization:
>102851186 >102851548
--New ooba feature allows download cancellation with ctrl+c:
>102850112 >102850232 >102850266 >102853434 >102853471 >102858260
--Models have video game knowledge but struggle with trivia questions:
>102853355
--Miku (free space):
>102850413 >102850771 >102851605 >102851900 >102853205 >102854227 >102854365 >102855777 >102856017 >102859296 >102861263

►Recent Highlight Posts from the Previous Thread: >>102850022

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous
10/17/24(Thu)14:23:38 No.102862133

Anonymous 10/17/24(Thu)14:23:38 No.102862133

https://x.com/_xjdr/status/1846944172445782408

Anonymous
10/17/24(Thu)14:26:55 No.102862176

Anonymous 10/17/24(Thu)14:26:55 No.102862176

File: 1708184095859170.png (130 KB, 488x497)

130 KB PNG

Anonymous
10/17/24(Thu)14:27:07 No.102862181

Anonymous 10/17/24(Thu)14:27:07 No.102862181

damn france was the shit at sone point wasnt it

Anonymous
10/17/24(Thu)14:28:44 No.102862202

Anonymous 10/17/24(Thu)14:28:44 No.102862202

>>102862181
Yes, but then we lost a world war and got flooded with 40IQ browns.

Anonymous
10/17/24(Thu)14:29:28 No.102862211

Anonymous 10/17/24(Thu)14:29:28 No.102862211

>>102862181
they were pretty good until the EU destroyed them along with the rest of Europe, now they survive on tourism like some third world country

Anonymous
10/17/24(Thu)14:33:01 No.102862255

Anonymous 10/17/24(Thu)14:33:01 No.102862255

File: 4918.jpg (108 KB, 882x1232)

108 KB JPG

Its over...

Anonymous
10/17/24(Thu)14:33:36 No.102862259

Anonymous 10/17/24(Thu)14:33:36 No.102862259

Nemotron beats every other 70B ive ever used for RP so far. Not sure if its better than mistral large tunes though. It has some oddities with its formatting.

Anonymous
10/17/24(Thu)14:34:47 No.102862267

Anonymous 10/17/24(Thu)14:34:47 No.102862267

>>102862255
Who is this arx person

Anonymous
10/17/24(Thu)14:38:02 No.102862303

Anonymous 10/17/24(Thu)14:38:02 No.102862303

So is the novideo 70B finetune actually better than largestral or is it just a meme?

Anonymous
10/17/24(Thu)14:38:41 No.102862313

Anonymous 10/17/24(Thu)14:38:41 No.102862313

>>102862303
bench gamed meme.

Anonymous
10/17/24(Thu)14:42:22 No.102862347

Anonymous 10/17/24(Thu)14:42:22 No.102862347

>>102862303
Its hard to compare. I really like its prose, and its quite smart / deeply introspective which is good for RP stuff.

>>102862313
Its human preference tuned which all it does it increase its personability. Its worse at coding than base llama but its FAR better at creative uses in my testings so far. It's actually creative and interesting unlike dry 3.1

Anonymous
10/17/24(Thu)14:43:12 No.102862355

Anonymous 10/17/24(Thu)14:43:12 No.102862355

>>102861776
This is really fun, I'll definitely give adding it to Mikupad a try.

Anonymous
10/17/24(Thu)14:43:34 No.102862361

Anonymous 10/17/24(Thu)14:43:34 No.102862361

I just came back from a two week vacation. Updated oobabooga and now all my models run 4-5x slower, from 20 t/s to 3 or 4 even with <10k context. I have a 1080ti with 7GB vram used. Tested with the new ministral 8b and magnum 12b v2 gguf with q4ks variants.
Did something happen to ooba while I was gone? Should I be using something else for my interface?

Anonymous
10/17/24(Thu)14:46:00 No.102862386

Anonymous 10/17/24(Thu)14:46:00 No.102862386

>>102862303
The show off Nemotron solving the strawberry riddle right on the model card. It's also been clearly trained on Sally and probably most of the reddit riddles. Human preference alignment means that it's just a bigger Starling.

Anonymous
10/17/24(Thu)14:46:18 No.102862388

Anonymous 10/17/24(Thu)14:46:18 No.102862388

>>102862361
Yeah, koboldccp/tabbyapi is the new meta

Anonymous
10/17/24(Thu)14:58:50 No.102862539

Anonymous 10/17/24(Thu)14:58:50 No.102862539

>>102862388
koboldcpp is the same, must be a llamacpp issue. Guess I'll have to wait for a patch.

Anonymous
10/17/24(Thu)15:10:09 No.102862642

Anonymous 10/17/24(Thu)15:10:09 No.102862642

File: file.png (110 KB, 200x232)

110 KB PNG

>>102862361
>he pulled

Anonymous
10/17/24(Thu)15:12:47 No.102862671

Anonymous 10/17/24(Thu)15:12:47 No.102862671

>>102853138
>>102853177
>>102855144
It could also be a methodology problem. Just looking at top-k=1 completely ignores any changes to the probability distribution that didn't bump the most likely token from first place.

Anonymous
10/17/24(Thu)15:20:10 No.102862756

Anonymous 10/17/24(Thu)15:20:10 No.102862756

>>102853205
>>102858490
Training an instruct model on unformatted raw text strikes me as methodologically unsound.
>I ran a fairly strong LoRA on it using a private raw-text dataset. The results were 'overcooked' so I did a 50/50 SLERP merge back onto the original model and this is the result of that merge.

Anonymous
10/17/24(Thu)15:22:29 No.102862774

Anonymous 10/17/24(Thu)15:22:29 No.102862774

when will 8b beat aicg?

Anonymous
10/17/24(Thu)15:25:07 No.102862807

Anonymous 10/17/24(Thu)15:25:07 No.102862807

>>102862361
How were you using ministral 8b two weeks ago?

Anonymous
10/17/24(Thu)15:35:30 No.102862902

Anonymous 10/17/24(Thu)15:35:30 No.102862902

>>102862255
Isn't MMLU a benchmark for knowledge evaluation? They only trained Nemotron to be aligned with arena preferences, so their training wouldn't add anything to its knowledge.

Nemotron is much better for RP and creative writing than the base model. That's what matters.

Anonymous
10/17/24(Thu)15:36:55 No.102862918

Anonymous 10/17/24(Thu)15:36:55 No.102862918

>>102862259
I second this opinion. It's the best 70b model I've used for RP.

Anonymous
10/17/24(Thu)15:42:23 No.102862990

Anonymous 10/17/24(Thu)15:42:23 No.102862990

>>102862259 >>102862347 >>102862902 >>102862918
What main prompt are you using, what instruct template, and what sampler settings?

Anonymous
10/17/24(Thu)15:42:44 No.102862999

Anonymous 10/17/24(Thu)15:42:44 No.102862999

Why do I get horrible slowdown in ST when using group chats?

Anonymous
10/17/24(Thu)15:44:05 No.102863019

Anonymous 10/17/24(Thu)15:44:05 No.102863019

>>102862990
Get your own

Anonymous
10/17/24(Thu)15:44:30 No.102863024

Anonymous 10/17/24(Thu)15:44:30 No.102863024

>>102862999
Probably because the context has to be reprocessed every time due to the card information of each character being different, and high on the context.

Anonymous
10/17/24(Thu)15:44:31 No.102863025

Anonymous 10/17/24(Thu)15:44:31 No.102863025

>>102862918
>>102862259
Didn't it fail the Nala test?

Anonymous
10/17/24(Thu)15:45:09 No.102863031

Anonymous 10/17/24(Thu)15:45:09 No.102863031

>>102862990
Try this:
https://files.catbox.moe/wwtnkf.json

Regex:
https://files.catbox.moe/qs0dwf.json

Anonymous
10/17/24(Thu)15:58:53 No.102863176

Anonymous 10/17/24(Thu)15:58:53 No.102863176

>>102863031
CoT with a cute formatting regex, will give it a try and A/B it against Llama 3.1 Instruct. What sampler settings have you personally used when experiencing good results?

>"allow_jailbreak": false
I always wonder when I see this whether someone left it at the SillyTavern default or if they tried and saw
<|start_header_id|>system<|end_header_id|>
doesn't work for injecting instructions after the start.

Anonymous
10/17/24(Thu)16:05:25 No.102863260

Anonymous 10/17/24(Thu)16:05:25 No.102863260

>>102863024
Yeah, I had to use the "join character cards" instead of the default of switching them.

Anonymous
10/17/24(Thu)16:06:55 No.102863268

Anonymous 10/17/24(Thu)16:06:55 No.102863268

>>102862990
0.21 smoothing, 0.03 min p, temp 1, DRY on.
>Instruct template
Basic llama 3 Instruct on Sillytavern
>System Prompt
"This roleplay consists of alternating messages between Assistant (you) and Human (the user). Human and Assistant take turns to add to the story, and this continues indefinitely.

Both {{user}} and {{char}} are major characters in the story, with other side characters taking on a supporting role.

There are strict rules for the contents added in each turn:
Human turn: Describe only {{user}}'s actions, dialogue, thoughts and feelings.
Assistant turn: Write only general story narration and the actions/dialogue of {{char}}. You cannot control or imply {{user}}'s thoughts or actions.

Note: Text that is formatted with parentheses is out of character and is directed to you outside of the role-play. If you are sent an OOC request, then it must be obeyed and implemented immediately!

(OOC: This is an example of an out-of-character message.)"

Anonymous
10/17/24(Thu)16:09:56 No.102863305

Anonymous 10/17/24(Thu)16:09:56 No.102863305

https://github.com/xjdr-alt/entropix/tree/70B/entropix

Anonymous
10/17/24(Thu)16:16:15 No.102863375

Anonymous 10/17/24(Thu)16:16:15 No.102863375

File: 1722759829144264.jpg (124 KB, 850x1000)

124 KB JPG

LLM powerd VNs when?

Anonymous
10/17/24(Thu)16:21:32 No.102863432

Anonymous 10/17/24(Thu)16:21:32 No.102863432

>still no 70b natively multimodal model
holy shit these niggers really dont want to risk anything, they just want that extra 1% mmlu pro

Anonymous
10/17/24(Thu)16:28:07 No.102863509

Anonymous 10/17/24(Thu)16:28:07 No.102863509

>>102863432
What's the use case?

Anonymous
10/17/24(Thu)16:28:59 No.102863516

Anonymous 10/17/24(Thu)16:28:59 No.102863516

>Llama.cpp still no SWA support.
>Llama.cpp still no multimodal support.
What the fuck do they do all day then, huh?

Anonymous
10/17/24(Thu)16:30:27 No.102863534

Anonymous 10/17/24(Thu)16:30:27 No.102863534

Hello, /lmg/, my old friend.

What literature should I read to fine-tune a coding-oriented LLM on my codebase?

Anonymous
10/17/24(Thu)16:30:36 No.102863535

Anonymous 10/17/24(Thu)16:30:36 No.102863535

>>102863509
the fact that you actually unlock the biggest functionality of AIs aside from AGI, its ability to properly interact with all GUIs, you know, the things that allow you to interact with everything that was made with human eyes in mind

Anonymous
10/17/24(Thu)16:32:59 No.102863563

Anonymous 10/17/24(Thu)16:32:59 No.102863563

File: file.png (82 KB, 1111x527)

82 KB PNG

I've been using Miqu Midnight for a long ass time and I just downloaded Nemotron to test it
how the fuck is it so annoying right from the start

Anonymous
10/17/24(Thu)16:34:33 No.102863578

Anonymous 10/17/24(Thu)16:34:33 No.102863578

>>102863563
maybe test it on an actual use case instead of grading its specific greeting message intstruction training gorilla nigger?

Anonymous
10/17/24(Thu)16:34:39 No.102863580

Anonymous 10/17/24(Thu)16:34:39 No.102863580

>>102863563
What is your problem?

Anonymous
10/17/24(Thu)16:36:03 No.102863596

Anonymous 10/17/24(Thu)16:36:03 No.102863596

>>102863516
>Llama.cpp still no SWA support.
Then who is running these Nemotron 70B ggufs and how?

Anonymous
10/17/24(Thu)16:38:20 No.102863621

Anonymous 10/17/24(Thu)16:38:20 No.102863621

Minitron is pretty good, easily the best 8B I've tried.

Anonymous
10/17/24(Thu)16:42:20 No.102863663

Anonymous 10/17/24(Thu)16:42:20 No.102863663

>>102863596
From what I understand the new interleaved sliding window attention mechanism borks ministral quants. Until llama.cpp gets on it you might as well not even touch ministral.

>still no llama 3.2 support either.
Its like they want their project to die.

Anonymous
10/17/24(Thu)16:44:44 No.102863692

Anonymous 10/17/24(Thu)16:44:44 No.102863692

>>102863621
Is it better than nemo 12B?
And if so, in which ways?

Anonymous
10/17/24(Thu)16:44:49 No.102863693

Anonymous 10/17/24(Thu)16:44:49 No.102863693

>>102863596
>Nemotron 70B ggufs
based on l3.1 and uses rope scaling, not swa?
>"_name_or_path": "meta-llama/Llama-3.1-70B-Instruct",
>"rope_scaling": {
https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF/blob/main/config.json

Anonymous
10/17/24(Thu)16:50:43 No.102863767

Anonymous 10/17/24(Thu)16:50:43 No.102863767

>>102863375
As soon as we get agi.

Anonymous
10/17/24(Thu)16:51:33 No.102863778

Anonymous 10/17/24(Thu)16:51:33 No.102863778

>>102863692
I've only tried Nemo 12B RPMax and it seemed pretty dumb to me. Minitron seems comparable to Mistral Small.

Anonymous
10/17/24(Thu)16:56:16 No.102863825

Anonymous 10/17/24(Thu)16:56:16 No.102863825

>try nemotron 70B
>shivers down the spine on the first response
I'm not trusting you people again.

Anonymous
10/17/24(Thu)16:59:30 No.102863862

Anonymous 10/17/24(Thu)16:59:30 No.102863862

>>102863825
I'm sure your prompt had nothing to do with it.

Anonymous
10/17/24(Thu)17:01:58 No.102863891

Anonymous 10/17/24(Thu)17:01:58 No.102863891

File: 8th-snitch.jpg (38 KB, 750x778)

38 KB JPG

>>102863825
lol.
lmao.

Anonymous
10/17/24(Thu)17:04:21 No.102863916

Anonymous 10/17/24(Thu)17:04:21 No.102863916

>>102863862
My prompt is high tier literary fiction with some fetish stuff. Even Nemo merges did better than that. Sad!

Anonymous
10/17/24(Thu)17:06:28 No.102863940

Anonymous 10/17/24(Thu)17:06:28 No.102863940

>>102863825
You learned a lesson.

Supplemental lesson: a sign of bullshit is using hyperbolic language ("best ever") as a substitute for details and examples. If something is so great it is very easy to give an example of how it blew you away.

Anonymous
10/17/24(Thu)17:06:52 No.102863948

Anonymous 10/17/24(Thu)17:06:52 No.102863948

>>102862807
I wasn't, I tested it today to compare against my older models. I used to just switch between gemma 2 9b sppo, magnum and nemo.

Anonymous
10/17/24(Thu)17:08:44 No.102863974

Anonymous 10/17/24(Thu)17:08:44 No.102863974

Anyone knows if there is a way to disable markdown rendering in Silly Tavern?

Anonymous
10/17/24(Thu)17:18:13 No.102864095

Anonymous 10/17/24(Thu)17:18:13 No.102864095

File: Untitled.png (98 KB, 638x346)

98 KB PNG

>>102863916 (Me)
And I'm not memeing here. It can realize it's reading something of high quality. Unfortunately it can't keep; instruct llms are usually bad at following style without finetuning.

>>102863940
That's why I try them online if possible before downloading, so no time/bandwidth wasted.

Anonymous
10/17/24(Thu)17:22:13 No.102864151

Anonymous 10/17/24(Thu)17:22:13 No.102864151

>>102864095
>It can realize it's reading something of high quality
kek

Anonymous
10/17/24(Thu)17:29:47 No.102864239

Anonymous 10/17/24(Thu)17:29:47 No.102864239

nemotron 70b really likes to do choose-your-own-adventure roleplay

Anonymous
10/17/24(Thu)17:40:34 No.102864360

Anonymous 10/17/24(Thu)17:40:34 No.102864360

Is there any local model can trade blows with my dick yet?

Anonymous
10/17/24(Thu)17:42:38 No.102864383

Anonymous 10/17/24(Thu)17:42:38 No.102864383

>>102864360
im sure if you look long enough you will find a local whore that used to be a model.

Anonymous
10/17/24(Thu)17:44:41 No.102864410

Anonymous 10/17/24(Thu)17:44:41 No.102864410

>>102864360
>my dick
Any 2B model should be about just the right.

Anonymous
10/17/24(Thu)17:46:25 No.102864427

Anonymous 10/17/24(Thu)17:46:25 No.102864427

>>102864410
not a pedo

Anonymous
10/17/24(Thu)17:49:03 No.102864460

Anonymous 10/17/24(Thu)17:49:03 No.102864460

>>102864095
interesting
i'm going to try putting "genre: high quality literature" in my author's note next rp session
maybe it'll work like "masterpiece, 1girl" the way image diffusion used to work

Anonymous
10/17/24(Thu)17:50:51 No.102864489

Anonymous 10/17/24(Thu)17:50:51 No.102864489

>>102864460
Purple prose will shoot through the roof.

Anonymous
10/17/24(Thu)18:07:56 No.102864686

Anonymous 10/17/24(Thu)18:07:56 No.102864686

>>102864460
That doesn't really work. LLMs weren't trained with such tags so they don't really know what high quality literature reads like. It will either not do much at all, or make it vomit purple prose and meaningless filler.
The best you can do is fill your context with good quality text and hope it picks up the patterns.

Anonymous
10/17/24(Thu)18:11:23 No.102864723

Anonymous 10/17/24(Thu)18:11:23 No.102864723

>>102864489
>>102864686
Not the first anons that don't pick up on sarcasm... or have they been here for a bit? hmmmm?????

Anonymous
10/17/24(Thu)18:12:02 No.102864729

Anonymous 10/17/24(Thu)18:12:02 No.102864729

File: 1722529467985490.jpg (281 KB, 1024x1024)

281 KB JPG

I believe there are a few reasons why we do in fact want a model that's good at trivia rather than focusing solely on smarts.

First we already know that models trained on something perform better at that thing over a model not trained on it that instead gets the information from RAG. It's not surprising that someone who is familiar with something is able to talk about it with more nuance and understanding than someone who isn't.

Second, a model that knows trivia also means it is more likely to have other kinds of creative text in its training data, so it could result in a more creative model overall (if the fine tune doesn't screw it all up).

Third, people love references, allusions, and memes in their media and entertainment. Why shouldn't at least one model in existence be able to spontaneously make those kinds of references unprompted? Imagine if you were doing a lighthearted Halo story/RP and the model suddenly references the Halo guy meme at some point (when it makes sense to). Wouldn't that be cool and natural? A model that can do that would be pretty fun.

While we need models to get smarter, if our personal goals are to have fun with models, then trivia should not be forsaken entirely. A balance of trivia knowledge and non-trivia knowledge would be important. And we can still have different models made by different people that focus on different things, wouldn't that be great?

Anonymous
10/17/24(Thu)18:14:04 No.102864754

Anonymous 10/17/24(Thu)18:14:04 No.102864754

nigger, faggot, troon, and you know what? nigger, faggot, and troon again.

Anonymous
10/17/24(Thu)18:14:30 No.102864758

Anonymous 10/17/24(Thu)18:14:30 No.102864758

>>102864729
Your point is...?

Anonymous
10/17/24(Thu)18:14:58 No.102864762

Anonymous 10/17/24(Thu)18:14:58 No.102864762

>>102864729
>words words words
If you want trivia so much, just use DBRX and let us know how much you enjoy it.

Anonymous
10/17/24(Thu)18:16:45 No.102864778

Anonymous 10/17/24(Thu)18:16:45 No.102864778

File: trash.jpg (43 KB, 658x439)

43 KB JPG

>She sighs wistfully. "Every part of this encounter becomes an exquisite dance between sight, touch, and imagination!"

Anonymous
10/17/24(Thu)18:19:24 No.102864814

Anonymous 10/17/24(Thu)18:19:24 No.102864814

>>102864489
>>102864686
>purple prose
it's doing something much weirder, "genre:high quality literature" raised the value of everything because of "high quality" i guess.
like usually the rp i use goes
>trashy girl comes over to my apartment at midnight, bitches about how much of a shithole my apartment is
>wants me to (let her crash because she got kicked out of her place/ beat the shit out of her stalker or exboyfriend/steal some money off a drug dealer)
now it's
>sultry girl comes over to my apartment at midnight, marvels at how nice it is
>wants me to (steal a monet painting worth millions)

Anonymous
10/17/24(Thu)18:20:20 No.102864826

Anonymous 10/17/24(Thu)18:20:20 No.102864826

>>102864778
for me its Victor Vex that shows up for some reason in multiple chats

>Suddenly, the sound of footsteps echoes from outside the warehouse. A figure, dressed in a long, black coat, steps into view. It's Victor Vex, a tall, lean man with piercing green eyes and jet-black hair. He's known for his insatiable lust and his love of watching others suffer.

>Victor Vex: his voice is low and menacing "Well, well, well… what do we have here?

Anonymous
10/17/24(Thu)18:20:30 No.102864831

Anonymous 10/17/24(Thu)18:20:30 No.102864831

>>102864758
That unlike what someone said last thread, trivia is not a bad thing for models to know.

>>102864762
>having an allergy towards reading, in a hobby with copious amounts of reading
Ironic.

Anonymous
10/17/24(Thu)18:22:08 No.102864854

Anonymous 10/17/24(Thu)18:22:08 No.102864854

>>102864826
What model? I've never seen that name.

Anonymous
10/17/24(Thu)18:22:58 No.102864867

Anonymous 10/17/24(Thu)18:22:58 No.102864867

>>102864831
>dismisses my point without adressing it
Ironic. I look forward to seeing your mindblowing DBRX logs that changes everyone's mind about smarts over trivia.

Anonymous
10/17/24(Thu)18:23:02 No.102864868

Anonymous 10/17/24(Thu)18:23:02 No.102864868

I just had a thought. What about low quality erotica? My human brain instantly imagines: "i suck yo dick. i do it fast" but maybe low quality erotica or low quality erotic roleplay is actually what we want?

Anonymous
10/17/24(Thu)18:23:04 No.102864870

Anonymous 10/17/24(Thu)18:23:04 No.102864870

>>102864729
1. What model/quant for this?

2. The right kind of training data to teach memes/reference is data that *integrates* that trivia, not that shits it out in response to pub quiz question prompts.

Anonymous
10/17/24(Thu)18:24:26 No.102864883

Anonymous 10/17/24(Thu)18:24:26 No.102864883

>>102864778
me throwing out another cum rag after long shiver session

Anonymous
10/17/24(Thu)18:25:17 No.102864889

Anonymous 10/17/24(Thu)18:25:17 No.102864889

>>102864867
I only did the same thing that you did to my post. Your dismissal doesn't actually make sense in the context of the original post, if you actually read it.

Anonymous
10/17/24(Thu)18:25:51 No.102864902

Anonymous 10/17/24(Thu)18:25:51 No.102864902

I firmly believe we need sloppier models.

Anonymous
10/17/24(Thu)18:26:42 No.102864913

Anonymous 10/17/24(Thu)18:26:42 No.102864913

File: god.png (50 KB, 1291x193)

50 KB PNG

>>102864868
Chronoboros 33B on high freq penalty once spewed me this apex creation of impersonation.

Anonymous
10/17/24(Thu)18:27:05 No.102864921

Anonymous 10/17/24(Thu)18:27:05 No.102864921

>>102864889
It's ok to admit you don't know what DBRX is

Anonymous
10/17/24(Thu)18:28:07 No.102864934

Anonymous 10/17/24(Thu)18:28:07 No.102864934

>>102864913
The most shocking thing to me is how it writes YESSS 10 times and somehow resists the siren call of doing it again.

Anonymous
10/17/24(Thu)18:28:55 No.102864946

Anonymous 10/17/24(Thu)18:28:55 No.102864946

>>102864868
>>102864913
I really need an AI to rub its fazzlenudge against my gigglestick.

Anonymous
10/17/24(Thu)18:29:58 No.102864958

Anonymous 10/17/24(Thu)18:29:58 No.102864958

>>102864870
1. I don't know. I don't test trivia a ton. But so far it feels like Mistral Large even at Q2 is fairly smart and creative without dipping into being a dry smarty pants model or a creative but pants on head retarded model.

2. That sounds like the right approach and there's no reason to think that I disagree. I believe if we are to get a trivia benchmark that truly tests trivia knowledge, it should be something that's not multiple choice but somehow tests how likely the model can spontaneously make a reference.

Anonymous
10/17/24(Thu)18:30:33 No.102864965

Anonymous 10/17/24(Thu)18:30:33 No.102864965

File: Screenshot_20241017-182637~2.png (171 KB, 720x1199)

171 KB PNG

>>102864868
>author note: Genre: Low quality smut
it's beautiful.

Anonymous
10/17/24(Thu)18:30:58 No.102864969

Anonymous 10/17/24(Thu)18:30:58 No.102864969

>>102864921
It's ok to admit you didn't read or understand my post.

Anonymous
10/17/24(Thu)18:31:40 No.102864977

Anonymous 10/17/24(Thu)18:31:40 No.102864977

>>102864921
>>102864969
Now... kiss!

Anonymous
10/17/24(Thu)18:32:56 No.102864989

Anonymous 10/17/24(Thu)18:32:56 No.102864989

>>102864814
>>102864965
You are kidding that it actually works this way...

Anonymous
10/17/24(Thu)18:36:10 No.102865030

Anonymous 10/17/24(Thu)18:36:10 No.102865030

Nemotron is the best I've used for humor. It's actually funny and inventive.

Anonymous
10/17/24(Thu)18:36:44 No.102865038

Anonymous 10/17/24(Thu)18:36:44 No.102865038

A hallucinated game world with an intelligent Miku in it

Anonymous
10/17/24(Thu)18:37:17 No.102865043

Anonymous 10/17/24(Thu)18:37:17 No.102865043

>>102865038
>an intelligent Miku
an oxymoron if I ever heard one

Anonymous
10/17/24(Thu)18:37:48 No.102865050

Anonymous 10/17/24(Thu)18:37:48 No.102865050

>>102864965
>low quality smut is actually higher quality

Anonymous
10/17/24(Thu)18:37:51 No.102865052

Anonymous 10/17/24(Thu)18:37:51 No.102865052

>>102864989
>>102864965
So what if you try this on a different model? Is it really working this way because of Nvidia's tuning?

Anonymous
10/17/24(Thu)18:37:59 No.102865055

Anonymous 10/17/24(Thu)18:37:59 No.102865055

>>102863375
LLMs can't keep secrets, they would spoil everything on the first scene

Anonymous
10/17/24(Thu)18:39:02 No.102865068

Anonymous 10/17/24(Thu)18:39:02 No.102865068

>>102865055
LLMs can't keep memories, they would forget everything after the first scene

Anonymous
10/17/24(Thu)18:39:36 No.102865076

Anonymous 10/17/24(Thu)18:39:36 No.102865076

>>102865052
i (second person quoted) am actually using Rocinante-12B-v2g-Q4_K_M and no i won't buy an ad.

Anonymous
10/17/24(Thu)18:41:54 No.102865094

Anonymous 10/17/24(Thu)18:41:54 No.102865094

>>102862756
I'm sorry, I don't speak reddit.

Anonymous
10/17/24(Thu)18:42:03 No.102865096

Anonymous 10/17/24(Thu)18:42:03 No.102865096

>>102865055
Wrong >>>102242181

Anonymous
10/17/24(Thu)18:45:28 No.102865118

Anonymous 10/17/24(Thu)18:45:28 No.102865118

>>102864729
I think I demonstrated yesterday that the knowledge is there - it just hasn't been well generalized into the instruct behavior.

Anonymous
10/17/24(Thu)18:46:39 No.102865129

Anonymous 10/17/24(Thu)18:46:39 No.102865129

>>102864965
SVOL

Anonymous
10/17/24(Thu)18:52:46 No.102865201

Anonymous 10/17/24(Thu)18:52:46 No.102865201

>>102865076
>i (second person quoted) am actually using Rocinante-12B-v2g-Q4_K_M

>Arsenal (Supported Chat Templates)
>* ChatML for RP
>* Alpaca for Story / Instruct Adventure
>* Mistral for NeMo
>* You can mix it up and see which works best for you.

What makes you the way you are, having not merely downloaded but run what is on its face a defective merge made by a moron?

Anonymous
10/17/24(Thu)18:55:15 No.102865228

Anonymous 10/17/24(Thu)18:55:15 No.102865228

>>102865201
What makes you the way you are, having not merely shat on a perfectly good merge without having so much as tried it yourself but being a moron and a nuisance here without contributing anything yourself?

Anonymous
10/17/24(Thu)18:56:54 No.102865244

Anonymous 10/17/24(Thu)18:56:54 No.102865244

>>102865118
I saw that. I think that's essentially a case of shallow knowledge as opposed to deep knowledge. It might've been trained on texts that directly have that information, but not any texts that reference or manipulate the information in other ways. It might be something that can't be solved with only fine tuning.

Anonymous
10/17/24(Thu)19:00:00 No.102865273

Anonymous 10/17/24(Thu)19:00:00 No.102865273

>>102864965

Too good to be true. This would be the biggest 'gotcha moment in for LLM cooming. No fucking way its this simple.

Anonymous
10/17/24(Thu)19:02:29 No.102865306

Anonymous 10/17/24(Thu)19:02:29 No.102865306

>>102865228
No really, tell me. Are you underaged? Did your mother drink while she was pregnant with you? Are you a non-native English speaker like the 'tard who excreted that negative-value-added merge of other people's fine tunes and pretended it was his original work? Explain what made you think any part of that is competent or acceptable.

Anonymous
10/17/24(Thu)19:04:43 No.102865338

Anonymous 10/17/24(Thu)19:04:43 No.102865338

>>102865273
I'd be curious to see how a larger Mistral reacts to it. Maybe it really does work this way? Or the issue is that while it writes in a more preferable way, it also becomes more retarded.

Anonymous
10/17/24(Thu)19:10:41 No.102865400

Anonymous 10/17/24(Thu)19:10:41 No.102865400

>>102865306
Can you go sperg out somewhere else? We might be on to something big here.

Anonymous
10/17/24(Thu)19:13:38 No.102865433

Anonymous 10/17/24(Thu)19:13:38 No.102865433

tell me what to think about nemotron 70b

Anonymous
10/17/24(Thu)19:14:05 No.102865435

Anonymous 10/17/24(Thu)19:14:05 No.102865435

>>102865433
it's ok

Anonymous
10/17/24(Thu)19:15:27 No.102865447

Anonymous 10/17/24(Thu)19:15:27 No.102865447

>>102865400
No. THIS is my sperging space, and I won't have it polluted by begging jeets and their braindead simps. Ignoring mentions of shittunes proliferates newfriends thinking they're fitting in by using them.

Anonymous
10/17/24(Thu)19:15:45 No.102865448

Anonymous 10/17/24(Thu)19:15:45 No.102865448

>>102865433
Its unique. Its prose is very different from llama / mistral models, closer to claude than anything I can name. Its also pretty smart and has a "deeper perspective" for RP, not sure else how to explain that. I suggest trying it. I really like it.

Anonymous
10/17/24(Thu)19:41:41 No.102865676

Anonymous 10/17/24(Thu)19:41:41 No.102865676

>>102865448
Flowery prose and more fun to use than largestral, which is surprising coming as a benchmaxxed corpo model. It's actually not that biased towards the assistant personality having checked the logits with a blank prompt.

Anonymous
10/17/24(Thu)19:48:32 No.102865743

Anonymous 10/17/24(Thu)19:48:32 No.102865743

>>102865448
leatherjacketman plz

Anonymous
10/17/24(Thu)19:50:34 No.102865764

Anonymous 10/17/24(Thu)19:50:34 No.102865764

Why does llamacpp seem so broken on Sillytavern for me? When I swipe to generate a new response it almost repeats it back to me verbatim, minus a few changes in words. Using ooba. Doesn't happen with koboldcpp.

Anonymous
10/17/24(Thu)19:51:52 No.102865776

Anonymous 10/17/24(Thu)19:51:52 No.102865776

>>102865764
Because your settings are fucked

Anonymous
10/17/24(Thu)19:51:56 No.102865777

Anonymous 10/17/24(Thu)19:51:56 No.102865777

>>102865764
It is not almost. It just repeats itself. Started a few pulls ago. Still not fixed.

Anonymous
10/17/24(Thu)19:53:39 No.102865796

Anonymous 10/17/24(Thu)19:53:39 No.102865796

File: Azur-Lane-RN-Torricelli-2.jpg (259 KB, 640x360)

259 KB JPG

I gave Nemotron 70B a try and it's actually not as bad as one would expect from something being shilled here. It does feel like the model has a better understanding of how to act in a RP than most models, although it's certainly slopped.

Anonymous
10/17/24(Thu)19:55:17 No.102865818

Anonymous 10/17/24(Thu)19:55:17 No.102865818

>>102865796
>it's certainly slopped.
It still has its shivers but I really like its prose otherwise. Not sure how nvidia made 3.1 better at RP than finetuners did.

Anonymous
10/17/24(Thu)19:56:39 No.102865833

Anonymous 10/17/24(Thu)19:56:39 No.102865833

>>102865777
Is the problem with ooba or sillytavern?
>>102865776
I don't think so. I'm using the same setting as I do with koboldcpp

Anonymous
10/17/24(Thu)19:58:31 No.102865847

Anonymous 10/17/24(Thu)19:58:31 No.102865847

>>102865818
Maybe they are the only hope we have? They make money making gpus. Their AI division is probably some playground division. Maybe they will actually make a coom model when nobody is looking or cares what they are doing?

Anonymous
10/17/24(Thu)20:02:46 No.102865897

Anonymous 10/17/24(Thu)20:02:46 No.102865897

And now that I thought about it some more if jewvidia forces buyback into agreements and they know that none of the reputable companies will make a cooming model... it is actually in jewvidia's interest to make a cooming model to increase demand for their products? Will you worship leatherjacket man if he delivers?

Anonymous
10/17/24(Thu)20:20:18 No.102866053

Anonymous 10/17/24(Thu)20:20:18 No.102866053

>>102865897
The ultimate cope...!

Anonymous
10/17/24(Thu)20:24:07 No.102866088

Anonymous 10/17/24(Thu)20:24:07 No.102866088

Do you need a equal amount of RAM and VRAM?I have a 4090 and 3090, but I'm considering getting either another 3090 for 72 or an a4000 for 64, but I have only 64gb of RAM.

Anonymous
10/17/24(Thu)20:27:50 No.102866127

Anonymous 10/17/24(Thu)20:27:50 No.102866127

>>102866088
You don't. Also, the A4000 would bottleneck your speed.

Anonymous
10/17/24(Thu)20:30:38 No.102866160

Anonymous 10/17/24(Thu)20:30:38 No.102866160

is there a 8-12b model capable of fulfilling my cringe japanese high school romance rp yet?
real coherent and interesting like in VNs?

Anonymous
10/17/24(Thu)20:35:39 No.102866214

Anonymous 10/17/24(Thu)20:35:39 No.102866214

>>102866127
But don't you need twice the ram? Would I be able to use an 8gb larger model with extra 8gb?

Anonymous
10/17/24(Thu)20:38:30 No.102866244

Anonymous 10/17/24(Thu)20:38:30 No.102866244

>>102866160
>real coherent and interesting like in VNs?
Like the lowest common denominator for both writing and videogames? Sure. The best you can run as a vramlet is mistral nemo or a finetune. Start with the original instruct and test finetunes if it's not enough.

Anonymous
10/17/24(Thu)20:50:15 No.102866351

Anonymous 10/17/24(Thu)20:50:15 No.102866351

File: claude logo.png (165 KB, 400x240)

165 KB PNG

https://x.com/atroyn/status/1846935326058827948

Anonymous
10/17/24(Thu)20:50:15 No.102866352

Anonymous 10/17/24(Thu)20:50:15 No.102866352

File: chatlog (37).png (184 KB, 830x516)

184 KB PNG

I like Nemotron.

Anonymous
10/17/24(Thu)20:50:26 No.102866355

Anonymous 10/17/24(Thu)20:50:26 No.102866355

>>102865448
Agree about the relatively unslopped prose, but It's too much dumber than Largestral for me to use.

I don't blame Nvidia for it though, 3.0/3,1 70B both have this weird quirk where they'll give you two good, sensible generations and the third one will be inexplicably completely retarded with a huge logical error or non sequitur you'd expect from an 8B model. NovelAI's new model (based on 3.0 70B) has the exact same issue. It's a shame Nvidia's tune wasn't able to beat that out of them.

Anonymous
10/17/24(Thu)20:55:32 No.102866397

Anonymous 10/17/24(Thu)20:55:32 No.102866397

https://www.reddit.com/r/ChatGPT/comments/1g5s4i2/has_science_gone_too_far/

Anonymous
10/17/24(Thu)20:58:04 No.102866422

Anonymous 10/17/24(Thu)20:58:04 No.102866422

File: file.png (2.36 MB, 1159x1125)

2.36 MB PNG

>>102866397
WOULD, YOU HEAR ME ANON??? I WOULD FUCK THAT PHONE

Anonymous
10/17/24(Thu)20:59:17 No.102866436

Anonymous 10/17/24(Thu)20:59:17 No.102866436

>>102866351
that's a good vonnegut book

Anonymous
10/17/24(Thu)21:02:04 No.102866466

Anonymous 10/17/24(Thu)21:02:04 No.102866466

Hi all, Drummer here...

>>102865201
>>102865228
v2g is not a merge.

>>102865306
>>102865447
I know who you are. I hope you can find inner peace and develop empathy. It's sad to see someone so unhinged and full of hate. I worry about you.

If you don't have a close friend, or can't afford a therapist, then maybe you could try talking to this model: https://huggingface.co/TheDrummer/Buddy-2B-v1

It'll walk you through your frustrations, and maybe help you discover what's wrong. Try to work on yourself before it's too late.

Anonymous
10/17/24(Thu)21:02:46 No.102866473

Anonymous 10/17/24(Thu)21:02:46 No.102866473

>>102866466
62 75 79 20 61 6E 20 61 64

Anonymous
10/17/24(Thu)21:05:18 No.102866496

Anonymous 10/17/24(Thu)21:05:18 No.102866496

>>102866466
tell me about new dawn

Anonymous
10/17/24(Thu)21:12:06 No.102866553

Anonymous 10/17/24(Thu)21:12:06 No.102866553

>>102862116
>-Models have video game knowledge but struggle with trivia questions
Honestly, if those autists who didn't create and maintain video game wiki's didn't exist current models would be a whole lot dumber when it comes to video game knowledge. Hats off to them.

Anonymous
10/17/24(Thu)21:13:00 No.102866558

Anonymous 10/17/24(Thu)21:13:00 No.102866558

What are some local models that are helpful for NOT jerking off?

Anonymous
10/17/24(Thu)21:14:01 No.102866563

Anonymous 10/17/24(Thu)21:14:01 No.102866563

>>102866558
if you want SFW models, then go for Claude or OpenAI?

Anonymous
10/17/24(Thu)21:15:48 No.102866579

Anonymous 10/17/24(Thu)21:15:48 No.102866579

>>102866473
6E 6F

Anonymous
10/17/24(Thu)21:26:44 No.102866675

Anonymous 10/17/24(Thu)21:26:44 No.102866675

>>102866466
Hi TheDrummer, why did you mix three instruct formats in one fine tune? It's kind of hard to believe someone did that on purpose.

Hi all, Drummer here...
10/17/24(Thu)21:39:46 No.102866795

Hi all, Drummer here... 10/17/24(Thu)21:39:46 No.102866795

>>102866675
I think it's a fun idea to have three instruct formats that behave differently and package it in one model. You can switch around the three for different levels of smarts, creativity, and prose.

Anonymous
10/17/24(Thu)21:46:02 No.102866849

Anonymous 10/17/24(Thu)21:46:02 No.102866849

>>102866795
Haha, what a fun and quirky idea!

Anonymous
10/17/24(Thu)21:51:13 No.102866901

Anonymous 10/17/24(Thu)21:51:13 No.102866901

Ah, another wild rodeo with lmao.cpp.
Cannot wait until my goofy file downloads and everything just works first try.

Anonymous
10/17/24(Thu)21:51:29 No.102866905

Anonymous 10/17/24(Thu)21:51:29 No.102866905

>>102866214
>Would I be able to use an 8gb larger model with extra 8gb
You would but unless you're CPUmaxxxing on DDR5 RAM it's gonna be painfully slow.
Also If you can go for the 3090 do it, the a4000 is only good if you're low on slots/power. The memory bandwidth gets totally mogged by 3090s

Anonymous
10/17/24(Thu)21:53:51 No.102866928

Anonymous 10/17/24(Thu)21:53:51 No.102866928

File: 1727123885469839.jpg (51 KB, 640x636)

51 KB JPG

Man, why the FUCK is nemo 12b always trying to get me to eat my own cum?

Anonymous
10/17/24(Thu)21:54:28 No.102866935

Anonymous 10/17/24(Thu)21:54:28 No.102866935

>>102866558
Qwen2.5 will moralize you to sleep if that's what you're looking for. Codestral is my guy for code completion.

Anonymous
10/17/24(Thu)21:57:45 No.102866972

Anonymous 10/17/24(Thu)21:57:45 No.102866972

>>102866928
It's a retard.

Anonymous
10/17/24(Thu)21:59:58 No.102866989

Anonymous 10/17/24(Thu)21:59:58 No.102866989

>>102866928
>>102866972
well have you tried it? maybe its on to something

Anonymous
10/17/24(Thu)22:20:09 No.102867146

Anonymous 10/17/24(Thu)22:20:09 No.102867146

elon actually delivered https://x.com/SawyerMerritt/status/1846799881597559014

Anonymous
10/17/24(Thu)22:27:31 No.102867210

Anonymous 10/17/24(Thu)22:27:31 No.102867210

What's the best model you could run locally with 96gb of VRAM? Specifically looking at code assistance.

Anonymous
10/17/24(Thu)22:27:33 No.102867211

Anonymous 10/17/24(Thu)22:27:33 No.102867211

>>102867146
not local
not a model (it's controlled by a human remotely)

Anonymous
10/17/24(Thu)22:37:44 No.102867280

Anonymous 10/17/24(Thu)22:37:44 No.102867280

>>102867210
Mistral Large 5.5bpw or Qwen2-72b 8bpw

Anonymous
10/17/24(Thu)22:40:36 No.102867308

Anonymous 10/17/24(Thu)22:40:36 No.102867308

>>102867211
Local and controlled by ai model now, humans do human shit, record all motion data then use it for said ai model training. Teleoperated data is the only way to teach it do stuff you want.

Anonymous
10/17/24(Thu)22:51:19 No.102867389

Anonymous 10/17/24(Thu)22:51:19 No.102867389

I haven't been here since August, what's the new meta?

Anonymous
10/17/24(Thu)22:59:44 No.102867456

Anonymous 10/17/24(Thu)22:59:44 No.102867456

>>102867389
death

Anonymous
10/17/24(Thu)23:00:12 No.102867458

Anonymous 10/17/24(Thu)23:00:12 No.102867458

>>102867389
Nothing has changed unless you're a poorfag who runs 20b models

Anonymous
10/17/24(Thu)23:01:34 No.102867465

Anonymous 10/17/24(Thu)23:01:34 No.102867465

>>102867389
meta is dead.

Anonymous
10/17/24(Thu)23:02:50 No.102867473

Anonymous 10/17/24(Thu)23:02:50 No.102867473

Nemotroon is sending all the right shivers down my spine, nvidia have done it again.
I think they have the best RP multi-turn dataset in the local sector. Just a touch of anti-sloppa makes the model fucking godlike.

Anonymous
10/17/24(Thu)23:16:12 No.102867571

Anonymous 10/17/24(Thu)23:16:12 No.102867571

is nemo 12b still best for a poorfag?

Anonymous
10/17/24(Thu)23:22:04 No.102867620

Anonymous 10/17/24(Thu)23:22:04 No.102867620

nemotron 70b is pretty censored. I think it'll need a jailbreak to get it to be willing to output ERP.

Anonymous
10/17/24(Thu)23:23:28 No.102867630

Anonymous 10/17/24(Thu)23:23:28 No.102867630

I checked up on https://app.primeintellect.ai/intelligence
Looks like the pace has picked up a bit, so this training run might complete in less than 100 days after all.

Anonymous
10/17/24(Thu)23:23:42 No.102867634

Anonymous 10/17/24(Thu)23:23:42 No.102867634

>>102867571
Pretty much. Some people will suggest a tune of it, but I have had the most success with the official instruct and just a little wrangling.

Anonymous
10/17/24(Thu)23:26:30 No.102867664

Anonymous 10/17/24(Thu)23:26:30 No.102867664

What do you guys think about the new changes vedal did to add a bunch of agent features into neuro? https://www.youtube.com/watch?v=qev-dEfuomQ

Anonymous
10/17/24(Thu)23:30:54 No.102867711

Anonymous 10/17/24(Thu)23:30:54 No.102867711

>>102858904
Thanks I will give it a go when I get up to speed

>>102859266
Yeah saw about that, Im gonna ignore the increases though ha until theres a large leap somewhere in tech, which feels not far away.

>>102858868
I am also running 2 4090s and I just wanted to load big LLMs for my slower tasks with the extra ram. I wouldnt recommend the glacial rate of cpu only compared to gpus

Anonymous
10/17/24(Thu)23:33:41 No.102867726

Anonymous 10/17/24(Thu)23:33:41 No.102867726

File: 1728766845557788.png (177 KB, 2394x646)

177 KB PNG

nvidia's "Sana" https://nvlabs.github.io/Sana/

Anonymous
10/17/24(Thu)23:37:08 No.102867759

Anonymous 10/17/24(Thu)23:37:08 No.102867759

File: 1714044647085267.png (55 KB, 717x546)

55 KB PNG

>>102867726
https://arxiv.org/pdf/2410.10629

Anonymous
10/18/24(Fri)01:04:49 No.102868439

Anonymous 10/18/24(Fri)01:04:49 No.102868439

>>102867664
I don't think about it at all.

Anonymous
10/18/24(Fri)01:08:56 No.102868471

Anonymous 10/18/24(Fri)01:08:56 No.102868471

>entropix

lemme know if anyone tests it out with examples

Anonymous
10/18/24(Fri)01:09:13 No.102868472

Anonymous 10/18/24(Fri)01:09:13 No.102868472

>>102867664
I don't know what any of those words mean.

Anonymous
10/18/24(Fri)01:12:22 No.102868489

Anonymous 10/18/24(Fri)01:12:22 No.102868489

>>102868471
Tried it the other day and it seemed broken. Failed 9.9 vs 9.11 every time except the one time it failed to answer at all. Need two more weeks to bake this nothingburger.

Anonymous
10/18/24(Fri)01:19:37 No.102868536

Anonymous 10/18/24(Fri)01:19:37 No.102868536

have you guys ever began to feel your cock stir after she says something barely above a whisper?

Anonymous
10/18/24(Fri)01:27:48 No.102868582

Anonymous 10/18/24(Fri)01:27:48 No.102868582

>>102867664
I think you should buy and ad but this seems genuine so buy a map instead since you're lost

Anonymous
10/18/24(Fri)01:29:28 No.102868589

Anonymous 10/18/24(Fri)01:29:28 No.102868589

>>102868582
How are you guys not interesting in making an AI gf that could do things like that

Anonymous
10/18/24(Fri)01:33:47 No.102868614

Anonymous 10/18/24(Fri)01:33:47 No.102868614

Did another character card site get shut down recently because for like the past 3 months chub has had absolutely dogshit quality uploads from pajeets and brown hands. These cards are utter dogshit. It wasn't like it was great before but now it's an epidemic. I was trying with the cards I created and uploaded but they just immediately get drowned out by the flood of uploaded shit.

Anonymous
10/18/24(Fri)01:34:03 No.102868618

Anonymous 10/18/24(Fri)01:34:03 No.102868618

>>102868536
yeah, that's always shortly before, with a strangled cry, I release what feels like a gallon of cum

Anonymous
10/18/24(Fri)02:04:22 No.102868813

Anonymous 10/18/24(Fri)02:04:22 No.102868813

Improving Instruction-Following in Language Models through Activation Steering
https://arxiv.org/abs/2410.12877
>The ability to follow instructions is crucial for numerous real-world applications of language models. In pursuit of deeper insights and more powerful capabilities, we derive instruction-specific vector representations from language models and use them to steer models accordingly. These vectors are computed as the difference in activations between inputs with and without instructions, enabling a modular approach to activation steering. We demonstrate how this method can enhance model adherence to constraints such as output format, length, and word inclusion, providing inference-time control over instruction following. Our experiments across four models demonstrate how we can use the activation vectors to guide models to follow constraints even without explicit instructions and to enhance performance when instructions are present. Additionally, we explore the compositionality of activation steering, successfully applying multiple instructions simultaneously. Finally, we demonstrate that steering vectors computed on instruction-tuned models can transfer to improve base models. Our findings demonstrate that activation steering offers a practical and scalable approach for fine-grained control in language generation.
kind of interesting. seem decent at using the vectors to steer response lengths by number of sentences. models tested were small and no code though

Anonymous
10/18/24(Fri)02:27:04 No.102868969

Anonymous 10/18/24(Fri)02:27:04 No.102868969

Deepseek Janus support for llama.cpp soon?

Anonymous
10/18/24(Fri)02:37:19 No.102869035

Anonymous 10/18/24(Fri)02:37:19 No.102869035

File: Untitled.png (2.07 MB, 1080x4094)

2.07 MB PNG

SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction
https://arxiv.org/abs/2410.13846
>Recent advancements in large language models (LLMs) have extended their capabilities to handle long contexts. However, increasing the number of model layers and the length of input sequences significantly escalates the memory required to store key-value (KV) cache, posing challenges for efficient inference. To mitigate this issue, we present SimLayerKV, a simple yet effective method that reduces inter-layer KV cache redundancies by selectively dropping cache in identified lazy layers. Our approach is based on the observation that certain layers in long-context LLMs exhibit "lazy" behavior, contributing less to modeling long-range dependencies compared to non-lazy layers. By analyzing attention weight patterns, we find that the behavior of these lazy layers is consistent across tokens during generation for a given input. This insight motivates our SimLayerKV, which identifies lazy layers and reduces their KV cache accordingly. SimLayerKV is training-free, generalizable, and can be implemented with only seven lines of code. We conduct extensive experiments on three representative LLMs, e.g., LLaMA2-7B, LLaMA3-8B, and Mistral-7B across 16 tasks from the LongBench benchmark. The results demonstrate that SimLayerKV achieves a KV cache compression ratio of 5× with only a 1.2% performance drop when combined with 4-bit quantization.
https://github.com/sail-sg/SimLayerKV
looks pretty simple to use yet still pretty effective. neat

Anonymous
10/18/24(Fri)02:38:31 No.102869040

Anonymous 10/18/24(Fri)02:38:31 No.102869040

>>102868969
Probably never honestly.

Anonymous
10/18/24(Fri)02:49:14 No.102869101

Anonymous 10/18/24(Fri)02:49:14 No.102869101

File: Untitled.png (402 KB, 1080x1223)

402 KB PNG

A Little Human Data Goes A Long Way
https://arxiv.org/abs/2410.13098
>Faced with an expensive human annotation process, creators of NLP systems increasingly turn to synthetic data generation. While this method shows promise, the extent to which synthetic data can replace human annotation is poorly understood. We investigate the use of synthetic data in Fact Verification (FV) and Question Answering (QA) by studying the effects of incrementally replacing human generated data with synthetic points on eight diverse datasets. Strikingly, replacing up to 90% of the training data only marginally decreases performance, but replacing the final 10% leads to severe declines. We find that models trained on purely synthetic data can be reliably improved by including as few as 125 human generated data points. We show that matching the performance gain of just a little additional human data (only 200 points) requires an order of magnitude more synthetic data and estimate price ratios at which human annotation would be a more cost-effective solution. Our results suggest that even when human annotation at scale is infeasible, there is great value to having a small proportion of the dataset being human generated.
https://github.com/DhananjayAshok/LittleHumanData
looks like Miku will have a reason to keep us around

Anonymous
10/18/24(Fri)02:50:36 No.102869111

Anonymous 10/18/24(Fri)02:50:36 No.102869111

https://huggingface.co/deepseek-ai/Janus-1.3B

Anonymous
10/18/24(Fri)02:52:51 No.102869128

Anonymous 10/18/24(Fri)02:52:51 No.102869128

>>102869111
Huge version soon?

Anonymous
10/18/24(Fri)03:05:00 No.102869222

Anonymous 10/18/24(Fri)03:05:00 No.102869222

>>102869128
>236b image generating multimodal
god I wish

Anonymous
10/18/24(Fri)03:05:34 No.102869230

Anonymous 10/18/24(Fri)03:05:34 No.102869230

Quamba: A Post-Training Quantization Recipe for Selective State Space Models
https://arxiv.org/abs/2410.13229
https://github.com/enyac-group/Quamba
no code yet. doesn't mention mamba 2 so not sure if the architectural changes render this nonfunctional for it. works for jamba. eh

Anonymous
10/18/24(Fri)03:07:09 No.102869243

Anonymous 10/18/24(Fri)03:07:09 No.102869243

>>102869111
gguf?

Anonymous
10/18/24(Fri)03:35:20 No.102869424

Anonymous 10/18/24(Fri)03:35:20 No.102869424

File: 1717027853525879.png (452 KB, 850x611)

452 KB PNG

>>102869101
Synthetic data is not sufficiently diverse, that's all there is to it.

Anonymous
10/18/24(Fri)03:37:57 No.102869444

Anonymous 10/18/24(Fri)03:37:57 No.102869444

>>102869424
I feel like it could be if only they'd use "unsafe" models to generate it.

Anonymous
10/18/24(Fri)03:40:10 No.102869462

Anonymous 10/18/24(Fri)03:40:10 No.102869462

We use human data to train models to generate fake data to train models. They should work on making them better without this.

Anonymous
10/18/24(Fri)03:42:11 No.102869479

Anonymous 10/18/24(Fri)03:42:11 No.102869479

>>102869424
Do you know what subject the red cluster in the right is? Or where did you get this from?

Anonymous
10/18/24(Fri)03:43:06 No.102869491

Anonymous 10/18/24(Fri)03:43:06 No.102869491

>>102869479
Loli fiction

Anonymous
10/18/24(Fri)03:43:26 No.102869492

Anonymous 10/18/24(Fri)03:43:26 No.102869492

>>102869479
It's an old paper https://www.researchgate.net/publication/370228047_SocialDial_A_Benchmark_for_Socially-Aware_Dialogue_Systems the model was GPT3, but nothing really changed

Anonymous
10/18/24(Fri)04:02:07 No.102869628

Anonymous 10/18/24(Fri)04:02:07 No.102869628

File: whoveryniceofthem.png (79 KB, 428x371)

79 KB PNG

>>102869492
Thanks for the link. I'll read it in more detail later.

Anonymous
10/18/24(Fri)04:04:52 No.102869647

Anonymous 10/18/24(Fri)04:04:52 No.102869647

>>102869628
they used the ai to write the ethical considerations boilerplate...

Anonymous
10/18/24(Fri)06:35:14 No.102870871

Anonymous 10/18/24(Fri)06:35:14 No.102870871

>ooba is deeply fucked again
it's all so tiring

Anonymous
10/18/24(Fri)06:36:06 No.102870886

Anonymous 10/18/24(Fri)06:36:06 No.102870886

Am I allowed to install lama?

Anonymous
10/18/24(Fri)06:44:56 No.102870981

Anonymous 10/18/24(Fri)06:44:56 No.102870981

>>102870871
>using gradio shitware
you get what you deserve

Anonymous
10/18/24(Fri)06:57:19 No.102871090

Anonymous 10/18/24(Fri)06:57:19 No.102871090

File: 1720533839023564.jpg (55 KB, 500x500)

55 KB JPG

is 15 lines too long for a system prompt, or is it a skill issue

Anonymous
10/18/24(Fri)07:03:03 No.102871144

Anonymous 10/18/24(Fri)07:03:03 No.102871144

>>102871090
if your sysprompt is fewer than 10k tokens you have very simple needs

Anonymous
10/18/24(Fri)07:08:41 No.102871202

Anonymous 10/18/24(Fri)07:08:41 No.102871202

>>102871090
Being able to express yourself without rambling is a skill.

Anonymous
10/18/24(Fri)07:11:00 No.102871228

Anonymous 10/18/24(Fri)07:11:00 No.102871228

>>102871090
>counting in lines instead of tokens
It's already over

Anonymous
10/18/24(Fri)07:16:55 No.102871302

Anonymous 10/18/24(Fri)07:16:55 No.102871302

>>102871228
It's simple math, anon. Assuming they are full lines, it's about 700 tokens

Anonymous
10/18/24(Fri)07:19:23 No.102871323

Anonymous 10/18/24(Fri)07:19:23 No.102871323

>>102869424
But the models trained on synth data will be very good at whatever the red sector does

Anonymous
10/18/24(Fri)07:21:32 No.102871345

Anonymous 10/18/24(Fri)07:21:32 No.102871345

>>102871323
No, that's why full synthetic data is tanking the accuracy here >>102869101

Anonymous
10/18/24(Fri)07:37:19 No.102871525

Anonymous 10/18/24(Fri)07:37:19 No.102871525

File: ComfyUI_05091_.png (267 KB, 1024x1024)

267 KB PNG

>>102870886

Anonymous
10/18/24(Fri)07:41:21 No.102871575

Anonymous 10/18/24(Fri)07:41:21 No.102871575

>>102870886
You can have a little bit llama

Anonymous
10/18/24(Fri)07:52:27 No.102871708

Anonymous 10/18/24(Fri)07:52:27 No.102871708

GOOD MORNING SIRS
qrd on ministral 8b? does it have SOVL or do i have to stay with nemo finetunes for now?

Anonymous
10/18/24(Fri)08:00:34 No.102871815

Anonymous 10/18/24(Fri)08:00:34 No.102871815

>>102871144
I mostly just do a lot of striping and raping, so I guess. 2k tokens. It's still a bit of a chore getting it to describe their genitals without also describing their hard nipples under a full burka, though.

Anonymous
10/18/24(Fri)08:01:52 No.102871828

Anonymous 10/18/24(Fri)08:01:52 No.102871828

>>102871708
When it's good... it's really fucking good. But it can also be a bit hit and miss at times.

Anonymous
10/18/24(Fri)08:42:01 No.102872145

Anonymous 10/18/24(Fri)08:42:01 No.102872145

>>102871828
so just like every other model?

Anonymous
10/18/24(Fri)08:45:15 No.102872173

Anonymous 10/18/24(Fri)08:45:15 No.102872173

>>102872145
Nobody asked you.

Anonymous
10/18/24(Fri)08:48:58 No.102872206

Anonymous 10/18/24(Fri)08:48:58 No.102872206

>>102872173
NTA but I like to hear what he has to say.

Anonymous
10/18/24(Fri)08:51:04 No.102872223

Anonymous 10/18/24(Fri)08:51:04 No.102872223

>>102872206
Literally all xhe does is flail around bemoaning literally anything other people enjoy. You can replicate that at home by being an abject failure of a human being.

Anonymous
10/18/24(Fri)08:51:45 No.102872230

Anonymous 10/18/24(Fri)08:51:45 No.102872230

>>102872223
How do you identify xir?

Anonymous
10/18/24(Fri)08:53:22 No.102872240

Anonymous 10/18/24(Fri)08:53:22 No.102872240

>>102871708
I think users should check it out once an official Transformers version will be uploaded on HuggingFace. The one MistralAI recommends to run with vLLM seems broken in various ways.

Anonymous
10/18/24(Fri)08:53:28 No.102872242

Anonymous 10/18/24(Fri)08:53:28 No.102872242

>>102872230
Intuition. They can identify you across threads though because they're one of the mods. They've accidentally let this ability slip in the past. I guess the admins reigned their ass in. But in the earlier days of the threads you'd get a 3 day vacation any time you gave them any friction back.

Anonymous
10/18/24(Fri)08:55:04 No.102872254

Anonymous 10/18/24(Fri)08:55:04 No.102872254

>>102872242
It was just a simple comment about how models are very unpredictable, which they are.

Anonymous
10/18/24(Fri)09:20:58 No.102872458

Anonymous 10/18/24(Fri)09:20:58 No.102872458

>llama3 is completely filtered of bad thoughts
>facebook is still doing mass censorship despite what zucc said
>lecun lost his mind over musk and is gradually being exposed as a hack
False prophets

Anonymous
10/18/24(Fri)09:24:16 No.102872488

Anonymous 10/18/24(Fri)09:24:16 No.102872488

>>102872458
It's still surprising to me how crazy Musk makes some people.

Anonymous
10/18/24(Fri)09:32:03 No.102872551

Anonymous 10/18/24(Fri)09:32:03 No.102872551

>>102871708
Finicky, copies formatting rigidly, doesn't follow/understand formatting instructions. Same as Nemo, I guess?

Anonymous
10/18/24(Fri)09:53:51 No.102872746

Anonymous 10/18/24(Fri)09:53:51 No.102872746

Nemotron 70B is so good, I feel like crying
How long will we be dependent on corpos releasing kino instead of we doing it ourselves?

Anonymous
10/18/24(Fri)09:59:26 No.102872787

Anonymous 10/18/24(Fri)09:59:26 No.102872787

>>102872746
Is there any uncensored finetune out yet?

Anonymous
10/18/24(Fri)10:02:24 No.102872810

Anonymous 10/18/24(Fri)10:02:24 No.102872810

>>102872787
How does one overcome skill issue via finetuning?

Anonymous
10/18/24(Fri)10:04:59 No.102872836

Anonymous 10/18/24(Fri)10:04:59 No.102872836

>>102872787
>finetune of a finetune
get a load of this guy

Anonymous
10/18/24(Fri)10:07:43 No.102872857

Anonymous 10/18/24(Fri)10:07:43 No.102872857

>>102872787
There are no uncensored fine-tunes, just horny sloptunes.

Anonymous
10/18/24(Fri)10:08:26 No.102872863

Anonymous 10/18/24(Fri)10:08:26 No.102872863

>>102872857
Is there any horny version of Nemotron then?

Anonymous
10/18/24(Fri)10:12:29 No.102872900

Anonymous 10/18/24(Fri)10:12:29 No.102872900

>>102872863
not yet

Anonymous
10/18/24(Fri)10:29:25 No.102873043

Anonymous 10/18/24(Fri)10:29:25 No.102873043

Unironically gonna buy a second gpu for nemotron.
Fuck you Jensen, you double nigger, you got me.

Anonymous
10/18/24(Fri)10:34:13 No.102873087

Anonymous 10/18/24(Fri)10:34:13 No.102873087

What do you even run it on? My ooba completely dieded.

Anonymous
10/18/24(Fri)10:37:22 No.102873104

Anonymous 10/18/24(Fri)10:37:22 No.102873104

>>102873087
Yours complains about dependency?
I had an issue after updating yesterday, I just updated it again and it started working.

Anonymous
10/18/24(Fri)10:38:37 No.102873119

Anonymous 10/18/24(Fri)10:38:37 No.102873119

>>102873087
Why are you unironically using ooba, that's like admitting to being a llm boomer

Anonymous
10/18/24(Fri)10:42:20 No.102873151

Anonymous 10/18/24(Fri)10:42:20 No.102873151

https://huggingface.co/deepseek-ai/Janus-1.3B
https://github.com/microsoft/BitNet

Anonymous
10/18/24(Fri)10:45:02 No.102873169

Anonymous 10/18/24(Fri)10:45:02 No.102873169

>>102873151
Shieet, first nvidia releases sota and now bitconect comes out?
Back bros, we're so back.

Anonymous
10/18/24(Fri)10:45:20 No.102873173

Anonymous 10/18/24(Fri)10:45:20 No.102873173

>>102873119
How do I start using ooba ironically?

Anonymous
10/18/24(Fri)10:47:03 No.102873193

Anonymous 10/18/24(Fri)10:47:03 No.102873193

File: 1727972056445148.png (8 KB, 424x133)

8 KB PNG

>>102873151
it's over...

Anonymous
10/18/24(Fri)10:49:42 No.102873215

Anonymous 10/18/24(Fri)10:49:42 No.102873215

>>102873169
Hey hey heyyyy

Anonymous
10/18/24(Fri)10:49:46 No.102873216

Anonymous 10/18/24(Fri)10:49:46 No.102873216

>>102873151
>Furthermore, bitnet.cpp can run a 100B BitNet b1.58 model on a single CPU, achieving speeds comparable to human reading (5-7 tokens per second),
Alright, okay.

Anonymous
10/18/24(Fri)10:50:24 No.102873222

Anonymous 10/18/24(Fri)10:50:24 No.102873222

>>102873119
>Running Kobold
Go suck henk's dick on the 'cord you faggot

Anonymous
10/18/24(Fri)10:52:15 No.102873238

Anonymous 10/18/24(Fri)10:52:15 No.102873238

>>102873151
>bitnet.cpp
Okay that's nice where model?

Anonymous
10/18/24(Fri)10:53:30 No.102873257

Anonymous 10/18/24(Fri)10:53:30 No.102873257

File: 1713562416948943.png (37 KB, 825x433)

37 KB PNG

>>102873238

Anonymous
10/18/24(Fri)10:54:48 No.102873267

Anonymous 10/18/24(Fri)10:54:48 No.102873267

>>102873257
Where usable models?

Anonymous
10/18/24(Fri)10:55:05 No.102873270

Anonymous 10/18/24(Fri)10:55:05 No.102873270

>>102873151
Damn, this means that bitnet really does work, they have the models internally, but for some reason they are not willing to release them. Very sus.

Anonymous
10/18/24(Fri)11:00:16 No.102873335

Anonymous 10/18/24(Fri)11:00:16 No.102873335

>>102873151
Bitnet bros we're so fucking back

Anonymous
10/18/24(Fri)11:02:16 No.102873351

Anonymous 10/18/24(Fri)11:02:16 No.102873351

Didn't llama.cpp already have (early?) bitnet support?
I think it was based on the code to run ternary quants.

Anonymous
10/18/24(Fri)11:03:11 No.102873365

Anonymous 10/18/24(Fri)11:03:11 No.102873365

Were the schizoposters right? Nvidia is literally forcing everyone to hold back release of serious bitnet models?

Anonymous
10/18/24(Fri)11:03:31 No.102873372

Anonymous 10/18/24(Fri)11:03:31 No.102873372

>>102873270
Because it would tank leatherman's njudea stocks very bad

Anonymous
10/18/24(Fri)11:05:29 No.102873388

Anonymous 10/18/24(Fri)11:05:29 No.102873388

>>102873365
Seems plausible, better deals on hardware if you only focus on GPU inference

Anonymous
10/18/24(Fri)11:12:04 No.102873435

Anonymous 10/18/24(Fri)11:12:04 No.102873435

>>102873365
China's Chip revolution despite US sanctions will force the gates open! Cheap GPUs for all!
I trust and believe!

Anonymous
10/18/24(Fri)11:19:55 No.102873494

Anonymous 10/18/24(Fri)11:19:55 No.102873494

>>102873435
Or they'll just turn into greedy cunts as soon as they have a product breakthrough

Anonymous
10/18/24(Fri)11:26:35 No.102873561

Anonymous 10/18/24(Fri)11:26:35 No.102873561

>>102873435
>despite
caused by

Anonymous
10/18/24(Fri)11:31:47 No.102873622

Anonymous 10/18/24(Fri)11:31:47 No.102873622

File: Screen Shot 2024-10-19 at(...).png (221 KB, 1224x946)

221 KB PNG

>>102873365

Anonymous
10/18/24(Fri)11:33:12 No.102873639

Anonymous 10/18/24(Fri)11:33:12 No.102873639

>>102866352
holy sovl

do you have any more chat logs?

Anonymous
10/18/24(Fri)11:33:12 No.102873640

Anonymous 10/18/24(Fri)11:33:12 No.102873640

>>102873270
>>102873365
>The tested models are dummy setups used in a research context to demonstrate the inference performance of bitnet.cpp.

Anonymous
10/18/24(Fri)11:36:52 No.102873687

Anonymous 10/18/24(Fri)11:36:52 No.102873687

>>102873151
NOTHINGSISTERS NOT LIKE THIS AIEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE

Anonymous
10/18/24(Fri)11:37:28 No.102873699

Anonymous 10/18/24(Fri)11:37:28 No.102873699

>>102873687
see >>102873640

Anonymous
10/18/24(Fri)11:38:59 No.102873717

Anonymous 10/18/24(Fri)11:38:59 No.102873717

>>102873388
>better deals
More like delaying GPU orders for companies that refuse to comply with leatherman's demands.

Anonymous
10/18/24(Fri)11:40:52 No.102873745

Anonymous 10/18/24(Fri)11:40:52 No.102873745

>>102873622
Everybody was saying that bitnet isn't worth it because it costs the same to train as f16, but that's bullshit because it would save millions in actually serving and running the model. JudeoVidya strikes again...

Anonymous
10/18/24(Fri)11:42:31 No.102873755

Anonymous 10/18/24(Fri)11:42:31 No.102873755

>>102873745
Exactly. Once trained, the brunt of the cost is in actually running the thing.
Plus, I have my doubts regarding that claim to begin with.

Anonymous
10/18/24(Fri)11:44:03 No.102873770

Anonymous 10/18/24(Fri)11:44:03 No.102873770

It seems Microsoft was the first company to realize the best way to get local users to fuck off was to give them good models that can be run easily so that they shut up and lose interest in the scene

Anonymous
10/18/24(Fri)11:50:13 No.102873843

Anonymous 10/18/24(Fri)11:50:13 No.102873843

>>102873755
The training claim holds some water because we don't have chips with ternary operations yet. Everything would need to be done in software first. But I think that once initial work would've been put in, it would've paid off massively. Hell, maybe all major companies already have their own implementation.

Anonymous
10/18/24(Fri)11:51:34 No.102873858

Anonymous 10/18/24(Fri)11:51:34 No.102873858

File: 2024-10-14_062025_seed442(...).jpg (459 KB, 1536x1536)

459 KB JPG

big bitnet models when

Anonymous
10/18/24(Fri)11:56:41 No.102873914

Anonymous 10/18/24(Fri)11:56:41 No.102873914

>>102873858
When companies can tell Jewidia to fuck off.

I've seen some people say Nvidia would benefit from Bitnet and that's correct but if there were a bunch of good big bitnet models out there it would tank consumer interest in GPU's and API services

Anonymous
10/18/24(Fri)11:59:13 No.102873939

Anonymous 10/18/24(Fri)11:59:13 No.102873939

Nemotron is good but ultracucked.

Anonymous
10/18/24(Fri)12:00:12 No.102873950

Anonymous 10/18/24(Fri)12:00:12 No.102873950

>>102873858
I'm looking forward to seeing a big BitNet MoE model that can fast on CPUs. Then NVidia can fuck off.

Anonymous
10/18/24(Fri)12:01:08 No.102873960

Anonymous 10/18/24(Fri)12:01:08 No.102873960

>>102873914
This. If everybody could easily run 100B+ at home, nobody would be interested in cloud models anymore. So even big corpos won't shoot themselves in the foot by releasing bitnet models. You will own nothing.

Anonymous
10/18/24(Fri)12:07:25 No.102874020

Anonymous 10/18/24(Fri)12:07:25 No.102874020

>>102873960
I'm sure somebody will do it eventually, Despite Sama's best attempts AI is still a pretty competitive space where companies have a reason to undercut each other. But it'll take a while

Anonymous
10/18/24(Fri)12:14:01 No.102874089

Anonymous 10/18/24(Fri)12:14:01 No.102874089

meta research dump
https://ai.meta.com/blog/fair-news-segment-anything-2-1-meta-spirit-lm-layer-skip-salsa-lingua/
>Meta Spirit LM: An open source language model for seamless speech and text integration.
>Meta Segment Anything Model 2.1: An updated checkpoint with improved results on visually similar objects, small objects and occlusion handling. Plus a new developer suite to make it easier for developers to build with SAM 2.
>Layer Skip: Inference code and fine-tuned checkpoints demonstrating a new method for enhancing LLM performance.
>SALSA: New code to enable researchers to benchmark AI-based attacks in support of validating security for post-quantum cryptography.
>Meta Lingua: A lightweight and self-contained codebase designed to train language models at scale.
>Meta Open Materials: New open source models and the largest dataset of its kind to accelerate AI-driven discovery of new inorganic materials.
>MEXMA: A new research paper and code for our novel pre-trained cross-lingual sentence encoder with coverage across 80 languages.
>Self-Taught Evaluator: a new method for generating synthetic preference data to train reward models without relying on human annotations.

Anonymous
10/18/24(Fri)12:15:01 No.102874099

Anonymous 10/18/24(Fri)12:15:01 No.102874099

>>102873858
two more miku weeku *pees on your router*

Anonymous
10/18/24(Fri)12:15:22 No.102874102

Anonymous 10/18/24(Fri)12:15:22 No.102874102

>>102874089
a full load of nothingburgers

Anonymous
10/18/24(Fri)12:18:07 No.102874140

Anonymous 10/18/24(Fri)12:18:07 No.102874140

File: 1729268279039.jpg (40 KB, 384x384)

40 KB JPG

>>102873858
Here is your Janus bitnet Miku.

Anonymous
10/18/24(Fri)12:18:54 No.102874155

Anonymous 10/18/24(Fri)12:18:54 No.102874155

Microsoft is trying to grab the inference engine monopoly with this bitnet.cpp thing. Llama.cpp devs better add their own support, or they gonna get demolished.

Anonymous
10/18/24(Fri)12:19:56 No.102874165

Anonymous 10/18/24(Fri)12:19:56 No.102874165

>>102874155
>Implying we're gonna get big Bitnet models anytime soon

Anonymous
10/18/24(Fri)12:20:23 No.102874175

Anonymous 10/18/24(Fri)12:20:23 No.102874175

What could 64G RAM run bitnet?

Anonymous
10/18/24(Fri)12:21:35 No.102874190

Anonymous 10/18/24(Fri)12:21:35 No.102874190

>>102874155
They are comparing performance TO llama.cpp, they already have had implementation for months, just not as fleshed out and fast (on cpu).

Anonymous
10/18/24(Fri)12:22:45 No.102874205

Anonymous 10/18/24(Fri)12:22:45 No.102874205

>>102874175
Github shows they ran 100B model on 64G's of ram

Anonymous
10/18/24(Fri)12:24:02 No.102874222

Anonymous 10/18/24(Fri)12:24:02 No.102874222

>>102874205
I can do that slowly at IQ3XXS now

Anonymous
10/18/24(Fri)12:24:51 No.102874235

Anonymous 10/18/24(Fri)12:24:51 No.102874235

There has to be more news than this.

Anonymous
10/18/24(Fri)12:25:03 No.102874236

Anonymous 10/18/24(Fri)12:25:03 No.102874236

>>102874222
*Not 100B, Largestral

Anonymous
10/18/24(Fri)12:26:12 No.102874254

Anonymous 10/18/24(Fri)12:26:12 No.102874254

>>102874222
Bitnet matches FP16

Anonymous
10/18/24(Fri)12:28:06 No.102874278

Anonymous 10/18/24(Fri)12:28:06 No.102874278

File: ED.jpg (435 KB, 2125x1411)

435 KB JPG

>>102866558
>helpful for NOT jerking off?
All of them.

Anonymous
10/18/24(Fri)12:28:09 No.102874281

Anonymous 10/18/24(Fri)12:28:09 No.102874281

>>102874089
Now watch llama4 be another ScaleAI provided, benchmaxxed model that's barely better than the original GPT4 (on the bench)

Anonymous
10/18/24(Fri)12:28:49 No.102874290

Anonymous 10/18/24(Fri)12:28:49 No.102874290

>>102874254
So I won't be able to run bigger models, but what I can run will be usable speed and higher quality?
Can bitnets be quanted?

Anonymous
10/18/24(Fri)12:31:30 No.102874327

Anonymous 10/18/24(Fri)12:31:30 No.102874327

>>102872746
>Nemotron 70B is so good, I feel like crying
What did it do specifically? Or are you doing what I did a while back with dark Miqu and you are shilling the model you didn't even download just to fuck with people here?

Anonymous
10/18/24(Fri)12:33:31 No.102874358

Anonymous 10/18/24(Fri)12:33:31 No.102874358

>>102872488
I am mad a scammer is just walking around in the open and people still pay him money instead of lynching him.

Anonymous
10/18/24(Fri)12:34:27 No.102874368

Anonymous 10/18/24(Fri)12:34:27 No.102874368

>>102874290
AFAIK no. But yeah basically you can fit FP16 level intelligence and achieve the same speeds you do right now

Anonymous
10/18/24(Fri)12:35:53 No.102874390

Anonymous 10/18/24(Fri)12:35:53 No.102874390

Imagine being an /lmg/ newfag and hearing all the worship of bitnet but not knowing what people are even talking about.

Anonymous
10/18/24(Fri)12:46:39 No.102874530

Anonymous 10/18/24(Fri)12:46:39 No.102874530

What are the odds that the chinks will make a 100B bitnet model?

Anonymous
10/18/24(Fri)12:50:31 No.102874578

Anonymous 10/18/24(Fri)12:50:31 No.102874578

>>102874530
They will release a 100b bitnet model once some other company releases their own 100b bitnet model, as it's always been.

Anonymous
10/18/24(Fri)12:52:55 No.102874612

Anonymous 10/18/24(Fri)12:52:55 No.102874612

>>102874390
No honor in being /lmg/ oldfag.
Imagine being proud of going trough 3 generations of llama models. People will make fun of you.

Anonymous
10/18/24(Fri)12:55:11 No.102874634

Anonymous 10/18/24(Fri)12:55:11 No.102874634

>>102874612
I got in after l2 launch. It was finally good enough to fool me into it being worth it. It wasn't.

Anonymous
10/18/24(Fri)12:57:10 No.102874666

Anonymous 10/18/24(Fri)12:57:10 No.102874666

I was here since Pyggy. I was spending hours just to generate logs on CAI for pygmalion lol

Anonymous
10/18/24(Fri)12:58:04 No.102874677

Anonymous 10/18/24(Fri)12:58:04 No.102874677

>>102874612
I remember trying to run BLOOM on my laptop.
Grim time

Anonymous
10/18/24(Fri)12:59:21 No.102874688

Anonymous 10/18/24(Fri)12:59:21 No.102874688

File: bitnala2.png (107 KB, 1040x360)

107 KB PNG

So I've been trying to simulate a nala test with the bitnet inferencing thing...
My base model prompting is a bit rusty. But yeah. I feel like there's more underlying issues than that.
The only sampler it lets you cotrol is temp. And it just repeats the same few tokens over and over again if it's too low, otherwise this is at t=1.2
But here's the world's first published bitnet Nala Test (on the Llama-3-8B one)
I assume the magical quantization process they used basically fucked up the model outside of the evals. I might try with one of the proper bitnet from scratch test models in the future but I have to go to work now.

Anonymous
10/18/24(Fri)13:01:01 No.102874714

Anonymous 10/18/24(Fri)13:01:01 No.102874714

>>102874688
>I assume the magical quantization process they used basically fucked up the model outside of the evals
Yeah, probably.
What I want is to see a bitnet model trained from scratch, not a quantization scheme.

Anonymous
10/18/24(Fri)13:02:46 No.102874747

Anonymous 10/18/24(Fri)13:02:46 No.102874747

>>102874688
Also inferencing speed is 100% thread bound with this. like 100%. t=physical cores is fastest.
But there's no way in god's green earth anything shy of a 196 core epyc CPU is going to get 7 token/sec on a fucking 100B model like they claim. Expect like 0.1 token/sec on your desktop 6 core.

Anonymous
10/18/24(Fri)13:04:08 No.102874769

Anonymous 10/18/24(Fri)13:04:08 No.102874769

>>102874688
It's not really the same as quantization, notice that it's only trained on 100b tokens

Anonymous
10/18/24(Fri)13:08:03 No.102874832

Anonymous 10/18/24(Fri)13:08:03 No.102874832

>>102874688
History is being made.
I'm so glad to see this monumental work done by us, a new era of AI begins now.

Honorable mention to Microsoft for throwing a couple of scrips together.

Anonymous
10/18/24(Fri)13:08:13 No.102874835

Anonymous 10/18/24(Fri)13:08:13 No.102874835

>>102874747
>He didn't even check the graphs
>100B benchmark on a Intel CPU with 6 P cores got 1.70t/s

Brainlet-kun I...

Anonymous
10/18/24(Fri)13:08:55 No.102874847

Anonymous 10/18/24(Fri)13:08:55 No.102874847

>>102874688
This feels like GPT3.5 turbo when I tried it when the API first came out. Repetition out the ass after the first message. WAGMI

Anonymous
10/18/24(Fri)13:10:46 No.102874871

Anonymous 10/18/24(Fri)13:10:46 No.102874871

>>102874847
That Llama 8B model was just a conversion. The guys who made it said it matched Llama 1 7B in intelligence. Proper Bitnet's supposed to be equal to or better than FP16

Anonymous
10/18/24(Fri)13:18:38 No.102874957

Anonymous 10/18/24(Fri)13:18:38 No.102874957

>>102874832
I like your sarcasm. I hope all of the faggots here are just pretending to be retarded at this nothingburger.

Anonymous
10/18/24(Fri)13:21:17 No.102874989

Anonymous 10/18/24(Fri)13:21:17 No.102874989

>>102874957
What will the next cope be when the 100B model drops?

Anonymous
10/18/24(Fri)13:22:11 No.102875002

Anonymous 10/18/24(Fri)13:22:11 No.102875002

>>102874989
The 100B model sends shivers down my spine.

Anonymous
10/18/24(Fri)13:24:06 No.102875023

Anonymous 10/18/24(Fri)13:24:06 No.102875023

>>102875002
You're right Anon, every technological advancement is a nothingburger when compared to your sheer inability to prompt and ban tokens

Anonymous
10/18/24(Fri)13:26:15 No.102875041

Anonymous 10/18/24(Fri)13:26:15 No.102875041

>>102874989
When a 100B model drops I will be happy. Making a framework for bitnet and quantising existing model into it means nothing because that is not the reason bitnet makes sense. It is something you would do if you want investment money from a retard who doesn't know anything about computers. 1B toy models also mean nothing. This is basically building a railroad network before trains are invented. Nice to have but worthless for now.

Anonymous
10/18/24(Fri)13:26:18 No.102875042

Anonymous 10/18/24(Fri)13:26:18 No.102875042

>>102875023
I can't ban tokens in open webui.

Anonymous
10/18/24(Fri)13:28:44 No.102875065

Anonymous 10/18/24(Fri)13:28:44 No.102875065

>>102875041
You're right in that Quantizing existing models is fucking worthless but undeniable confirmation that it works as expected is pretty huge.

I don't expect any company in dealings with Jewidia to drop a 100B model but Chinks will come up with something soon

Anonymous
10/18/24(Fri)13:31:58 No.102875112

Anonymous 10/18/24(Fri)13:31:58 No.102875112

>>102875065
>I don't expect any company in dealings with Jewidia to drop a 100B
Models expand to fit resources. If they start experimenting with proper bitnet models and works well enough, they'll just make 5-10T param models and release some "small" 100B parameter models for the masses.
Chinks haven't come up with anything since fireworks.

Anonymous
10/18/24(Fri)13:33:35 No.102875128

Anonymous 10/18/24(Fri)13:33:35 No.102875128

Qwen said they were looking into bitnet.
They have the compute to train a whole range of models from 1B to 100B, and could probably train an 8B bitnet in a few days.
So why isn't there one?
They are controlled.

Anonymous
10/18/24(Fri)13:34:31 No.102875139

Anonymous 10/18/24(Fri)13:34:31 No.102875139

>>102875112
There are definitely 5-10T param models in the works now but

>Release 100B textgen model publicly
>Customer interest in AI and API services tanks

It's like hanging yourself with one hand and shooting yourself in the dick with the other hand. You're souring your relationships with Nvidia and hurting your own business model

Anonymous
10/18/24(Fri)13:34:58 No.102875142

Anonymous 10/18/24(Fri)13:34:58 No.102875142

>>102873151
>thing.cpp
>it's actually Python

Anonymous
10/18/24(Fri)13:41:47 No.102875215

Anonymous 10/18/24(Fri)13:41:47 No.102875215

File: 1707856899638855.png (618 KB, 1206x880)

618 KB PNG

Anonymous
10/18/24(Fri)13:42:26 No.102875221

Anonymous 10/18/24(Fri)13:42:26 No.102875221

>>102875139
>Customer interest in AI and API services tanks
Remote models will always be faster and bigger. They have the hardware to run ridiculous models. Most normies i know AFK have a shitty 10 year old laptop with 4-8gb ram. The ones that upgraded have an entire 16GB of vram and like fwaaaaa 32gb ram... And none of them know what a 'github' is. Fuck, we have retards here everyday that don't know what a python venv is or cannot process errors on their terminals when they try to load models with huge contexts.
Don't overestimate the allure of convenience for normies. We are not the average.

Anonymous
10/18/24(Fri)13:45:09 No.102875253

Anonymous 10/18/24(Fri)13:45:09 No.102875253

>>102874327
No, I'm using it for real. This model is definitely something unique. What makes me say this is that the model seems to actually understand the context, it knows what "teasing" means, it doesn't jump on your dick when a character is just teasing. It also seems to be able to pull things from the character card to make the messages more interesting, like, in my character description there's a line that says "she has a c-cup chest and doesn't wear a bra", and in one random message the LLM wrote "She rolls onto her back, still grinning, and stretches, arching her back in a languid motion, which, given her lack of a bra, momentarily draws attention to her C-cup breasts".
This is so different from the usual LLM slop I get from models like Largestral, it's very refreshing.

Anonymous
10/18/24(Fri)13:48:14 No.102875291

Anonymous 10/18/24(Fri)13:48:14 No.102875291

>>102875221
This. The average masses will only be interested in local models if they can download a 100B app from the playstore on their phone.

Anonymous
10/18/24(Fri)13:48:37 No.102875294

Anonymous 10/18/24(Fri)13:48:37 No.102875294

>>102875215
What is this schizo shit

Anonymous
10/18/24(Fri)13:49:14 No.102875301

Anonymous 10/18/24(Fri)13:49:14 No.102875301

Ministral Large bitnet

Anonymous
10/18/24(Fri)13:55:44 No.102875376

Anonymous 10/18/24(Fri)13:55:44 No.102875376

>>102875215
wasting 1b on grok 3 is such a fucking waste holy shit. elon should've bought more dogecoins....

Anonymous
10/18/24(Fri)13:56:29 No.102875387

Anonymous 10/18/24(Fri)13:56:29 No.102875387

>>102874634
I'm curious what people saw in Llama 2 back in the day. I couldn't see a difference between L1 tunes and L2 tunes no matter how hard I tried, apart from the obvious thing (context length)
L3 has some issues too, but at least it's obviously a step up in intelligence

Anonymous
10/18/24(Fri)13:59:33 No.102875417

Anonymous 10/18/24(Fri)13:59:33 No.102875417

>>102875387
Llama 2 7B was better than llama 1 13B

Anonymous
10/18/24(Fri)14:02:21 No.102875451

Anonymous 10/18/24(Fri)14:02:21 No.102875451

>>102875417
Llama 3 8B, sure. Llama 2 7B? Get outta here.

Anonymous
10/18/24(Fri)14:09:14 No.102875534

Anonymous 10/18/24(Fri)14:09:14 No.102875534

>>102875451
Llama 1 was very bad anon. Although, I guess I should've said "as good as", not better.

Anonymous
10/18/24(Fri)14:10:12 No.102875545

Anonymous 10/18/24(Fri)14:10:12 No.102875545

File: 1729242567362067.png (462 KB, 512x768)

462 KB PNG

I do not want llama, nor mistral. I want mikusex.

Anonymous
10/18/24(Fri)14:17:20 No.102875631

Anonymous 10/18/24(Fri)14:17:20 No.102875631

>https://ai.meta.com/blog/fair-news-segment-anything-2-1-meta-spirit-lm-layer-skip-salsa-lingua/
>SAM 2.1
>Meta Spirit LM (Speech2Speech aka local GPT-4o)
>Meta Open Materials 2024
>Self-Taught Evaluator
merry early christmas

Anonymous
10/18/24(Fri)14:18:23 No.102875651

Anonymous 10/18/24(Fri)14:18:23 No.102875651

>>102875631
Not open weights. go back, buy an ad

Anonymous
10/18/24(Fri)14:20:33 No.102875682

Anonymous 10/18/24(Fri)14:20:33 No.102875682

>>102875651
>Not open weights
It literally is. Click the link.

Anonymous
10/18/24(Fri)14:21:35 No.102875694

Anonymous 10/18/24(Fri)14:21:35 No.102875694

>>102875631
>speech2speech scores 40% on MMLU
monkey paw curls once more
>>102875651
it is you mongoloid

Anonymous
10/18/24(Fri)14:27:47 No.102875768

Anonymous 10/18/24(Fri)14:27:47 No.102875768

File: spirit-lm-training.png (34 KB, 762x240)

34 KB PNG

>>102875694
>speech2speech scores 40% on MMLU
to be fair it was trained on a pitiful amount of data

Anonymous
10/18/24(Fri)14:33:02 No.102875854

Anonymous 10/18/24(Fri)14:33:02 No.102875854

>>102875631
>We released the model trained with direct preference optimization, which is a strong generative reward model on RewardBench, despite not using any human annotation in training data creation. It outperforms bigger models or using human-annotated labels, e.g. GPT-4, Llama-3.1-405B-Instruct, and Gemini-Pro. The model is also available as an evaluator on the AlpacaEval leaderboard, as one of the top-ranked evaluators in terms of human agreement rate while being around 7x to 10x faster than the default GPT-4 evaluator.
Big

Anonymous
10/18/24(Fri)14:44:39 No.102876015

Anonymous 10/18/24(Fri)14:44:39 No.102876015

>>102875854
Isn't it for training only?

Anonymous
10/18/24(Fri)14:46:24 No.102876039

Anonymous 10/18/24(Fri)14:46:24 No.102876039

File: fb24bb84d166c648574d16390(...).jpg (757 KB, 1792x3128)

757 KB JPG

>>102875545

Anonymous
10/18/24(Fri)14:48:10 No.102876055

Anonymous 10/18/24(Fri)14:48:10 No.102876055

>>102875631
chatgpt, summarize what this says

Anonymous
10/18/24(Fri)14:50:42 No.102876085

Anonymous 10/18/24(Fri)14:50:42 No.102876085

File: 1724295535744932.png (22 KB, 656x163)

22 KB PNG

All the layerskip models from Meta are just their old models that had some continued pretraining done to them. If we can pool together a couple dozen thousand we can get layerskip mistral large.

Anonymous
10/18/24(Fri)14:54:31 No.102876121

Anonymous 10/18/24(Fri)14:54:31 No.102876121

>>102876085
This space really moves too fast, I have no fucking clue what's going on anymore.
My mind is still stuck somewhere on dynamic temperature.

Anonymous
10/18/24(Fri)14:59:46 No.102876189

Anonymous 10/18/24(Fri)14:59:46 No.102876189

>>102876085
Doing a continued pre-training of Largestral wouldn't be very easy or cheap...

Anonymous
10/18/24(Fri)15:01:26 No.102876210

Anonymous 10/18/24(Fri)15:01:26 No.102876210

>>102876121
Not fast enough!

Anonymous
10/18/24(Fri)15:04:09 No.102876253

Anonymous 10/18/24(Fri)15:04:09 No.102876253

>>102876121
Why is your mind stuck on a meme sampler that wasn't a significant point at the LLM story, like, at all. So much that is was quickly forgotten?

Anonymous
10/18/24(Fri)15:09:20 No.102876326

Anonymous 10/18/24(Fri)15:09:20 No.102876326

>>102876253
Maybe that's the exact point when I got brain damage from placebo overdose.

Anonymous
10/18/24(Fri)15:19:48 No.102876444

Anonymous 10/18/24(Fri)15:19:48 No.102876444

>>102875631
The only thing there that is actually a production release is SAM 2.1. The other stuff is mostly just pure research artifacts that aren't for end users.

Anonymous
10/18/24(Fri)15:24:48 No.102876489

Anonymous 10/18/24(Fri)15:24:48 No.102876489

Layer Skip will save LLMs. A model can now make use of speculative decoding using its internal layers. This can be added to any existing model. All we have to do is to figure out how to do this via finetuning and inference speeds will almost double.
We haven't been so back since 2023

Anonymous
10/18/24(Fri)15:26:03 No.102876500

Anonymous 10/18/24(Fri)15:26:03 No.102876500

Does speculative decoding reduce quality? Is it just guessing what's next?

Anonymous
10/18/24(Fri)15:26:12 No.102876501

Anonymous 10/18/24(Fri)15:26:12 No.102876501

>>102876489
Why not use a smaller, pretrained model instead?
Llama.cpp can do that already, in fact.

Anonymous
10/18/24(Fri)15:27:43 No.102876524

Anonymous 10/18/24(Fri)15:27:43 No.102876524

>>102876500
>Does speculative decoding reduce quality?
No.
>Is it just guessing what's next?
Yes.
If the smaller model guesses wrong, it just slows down generation.

Anonymous
10/18/24(Fri)15:27:45 No.102876525

Anonymous 10/18/24(Fri)15:27:45 No.102876525

>>102876500
Its guessing whats next and if its wrong it discards the guess. No impact on quality but it can increase speed.

Anonymous
10/18/24(Fri)15:30:39 No.102876560

Anonymous 10/18/24(Fri)15:30:39 No.102876560

File: 1707269706041483.png (718 KB, 1656x581)

718 KB PNG

https://x.com/doomslide/status/1847344776376365065

Anonymous
10/18/24(Fri)15:33:34 No.102876602

Anonymous 10/18/24(Fri)15:33:34 No.102876602

>>102876583
>>102876583
>>102876583

Anonymous
10/18/24(Fri)15:40:45 No.102876698

Anonymous 10/18/24(Fri)15:40:45 No.102876698

>>102876501
Maybe if a smaller model with the same vocab doesn't exist, so doing this could be a cheap solution. Might also be interesting to have a small model use this technique to be even faster.

Anonymous
10/18/24(Fri)16:16:00 No.102877130

Anonymous 10/18/24(Fri)16:16:00 No.102877130

>he fell for the bitnet meme

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.