/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 11/26/24(Tue)11:53:30 No.103312983

File: IMG_0087.jpg (862 KB, 1488x1317)

862 KB JPG

/lmg/ - Local Models General Anonymous 11/26/24(Tue)11:53:30 No.103312983 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103298520 & >>103286673

►News
>(11/26) Anon re-implements Sparse Matrix Tuning paper: https://github.com/HeroMines/SMFT
>(11/25) Qwen2VL integrated with Flux: https://github.com/erwold/qwen2vl-flux
>(11/25) Speculative decoding added to llama-server: https://github.com/ggerganov/llama.cpp/pull/10455
>(11/22) LTX-Video: Real-time video generation on a single 4090: https://github.com/Lightricks/LTX-Video
>(11/21) Tülu3: Instruct finetunes on top of Llama 3.1 base: https://hf.co/collections/allenai/tulu-3-models-673b8e0dc3512e30e7dc54f5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/hsiehjackson/RULER
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
11/26/24(Tue)11:53:53 No.103312989

Anonymous 11/26/24(Tue)11:53:53 No.103312989

File: tetrecap2.png (1.11 MB, 1024x1024)

1.11 MB PNG

►Recent Highlights from the Previous Thread: >>103298520

--Paper: MambaIRv2: Attentive State Space Restoration:
>103308827 >103309055 >103309276 >103309351 >103309388 >103309778
--Papers:
>103308752
--llama.cpp speculative decoding update discussion:
>103303609 >103303634 >103303641 >103303672 >103303718 >103303799 >103303990 >103304141 >103304290 >103304378 >103304384 >103304450
--Qwen2vl-Flux image generation model discussion:
>103311018 >103311143
--Impressions and issues with Llama-3.1-Tulu-3-70B model:
>103304919 >103304975 >103304999 >103305041 >103305102 >103306035 >103306464 >103309098 >103309106 >103311939 >103309321 >103311182
--Anon releases PEFT and invites improvements:
>103310510 >103311152
--Anon asks if they can power a Tesla with a spare CPU cable:
>103299698 >103299909 >103300124 >103300180
--Allen AI's AGI achievement and its implications for model development:
>103309146 >103309209 >103310351 >103310222
--Purpose of model warmup during initialization:
>103310115 >103310188
--Optimizing cpumaxxing performance:
>103306785 >103306937 >103307679 >103307706 >103307356 >103307347
--Olmo models and language modeling methods:
>103305802 >103306676 >103306690
--New TTS model OuteTTS 0.2 500M, but Anon is unimpressed:
>103300622 >103300862
--Getting LTX video working with CLI workflow on A40 48GB card:
>103304753 >103304763 >103304841
--Anon shares info on Reflection-70B and AI misinformation:
>103310430
--Anon shares a passage from Samuel Butler's 1872 writing on mechanical consciousness:
>103302296 >103302328 >103302546
--Anon seeks help limiting abusive LLM usage:
>103303509 >103303580 >103304236 >103304253 >103304900
--Anon gets 38% speedup with speculative decoding on llama-server:
>103306207 >103306256
--Miku (free space):
>103298713 >103298723 >103299940 >103300594 >103302609 >103302833 >103303994 >103309499

►Recent Highlight Posts from the Previous Thread: >>103298523

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous
11/26/24(Tue)11:55:58 No.103313016

Anonymous 11/26/24(Tue)11:55:58 No.103313016

File: 1707929546239541.png (972 KB, 596x596)

972 KB PNG

Omgg it's migu

Anonymous
11/26/24(Tue)11:56:06 No.103313019

Anonymous 11/26/24(Tue)11:56:06 No.103313019

>>103313004
you could use qlora and tune it for cheap
papers say that qlora is really good

Anonymous
11/26/24(Tue)11:58:03 No.103313036

Anonymous 11/26/24(Tue)11:58:03 No.103313036

we teto now (again)

Anonymous
11/26/24(Tue)11:59:18 No.103313050

Anonymous 11/26/24(Tue)11:59:18 No.103313050

>>103313019
yeah, on benchmarks. qlora is a meme on real world scenarios

Anonymous
11/26/24(Tue)11:59:21 No.103313051

Anonymous 11/26/24(Tue)11:59:21 No.103313051

>>103312989
What an absolute dogshit recap.

Anonymous
11/26/24(Tue)11:59:45 No.103313053

Anonymous 11/26/24(Tue)11:59:45 No.103313053

File: TodayIsTheDay.png (1.17 MB, 1280x768)

1.17 MB PNG

Good morning lmg!

Anonymous
11/26/24(Tue)12:02:35 No.103313076

Anonymous 11/26/24(Tue)12:02:35 No.103313076

>>103313050
>qlora is a meme on real world scenarios
there is still no proof for this just like the 'modern models don't quant well and lose performance even at 8bit' meme doomposters love to repeat

Anonymous
11/26/24(Tue)12:02:54 No.103313082

Anonymous 11/26/24(Tue)12:02:54 No.103313082

>>103313053
kill xirself

Anonymous
11/26/24(Tue)12:04:24 No.103313098

Anonymous 11/26/24(Tue)12:04:24 No.103313098

>>103313053
Omgg it's fartsune shitu

Anonymous
11/26/24(Tue)12:05:29 No.103313111

Anonymous 11/26/24(Tue)12:05:29 No.103313111

>>103313053
Good morning Miku

Anonymous
11/26/24(Tue)12:05:49 No.103313114

Anonymous 11/26/24(Tue)12:05:49 No.103313114

>>103313076
https://arxiv.org/html/2410.21228v1

Anonymous
11/26/24(Tue)12:07:27 No.103313132

Anonymous 11/26/24(Tue)12:07:27 No.103313132

File: StunnedSilence.png (1.11 MB, 1280x768)

1.11 MB PNG

>>103313082
>>103313098

Anonymous
11/26/24(Tue)12:08:20 No.103313140

Anonymous 11/26/24(Tue)12:08:20 No.103313140

>>103312989
Thank you Recap Teto

Anonymous
11/26/24(Tue)12:09:51 No.103313162

Anonymous 11/26/24(Tue)12:09:51 No.103313162

>>103313114
>"training" less parameters is less effective than training all parameters
holy fucking shit

Anonymous
11/26/24(Tue)12:10:45 No.103313177

Anonymous 11/26/24(Tue)12:10:45 No.103313177

File: monkey lolipop.jpg (54 KB, 900x900)

54 KB JPG

>further vindicated about kobold seemingly breaking the fuck out of every model last thread
>decide maybe it is time to break from it and try something else for once
>remember there aren't any alternatives
>remember again how hyper fixated autistic this thread was about oobaballs at the start of the year
>go look at its git

>last update one month ago
>still outstanding pull requests from earlier this year

why though?

Anonymous
11/26/24(Tue)12:12:59 No.103313220

Anonymous 11/26/24(Tue)12:12:59 No.103313220

>>103313162
not just that, the paper says it creates "intruder dimensions" inside models even when training a high rank lora which make parameters worthless and literally lobotomize knowledge out of your model

Anonymous
11/26/24(Tue)12:14:45 No.103313243

Anonymous 11/26/24(Tue)12:14:45 No.103313243

>>103313177
>last update one month ago
I use booba, love booba, but the pace of dev is pathetic. Even the dev branch has hardly anything of interest in it. The project desperately needs someone with vision and drive to keep it from irrelevancy.

Anonymous
11/26/24(Tue)12:14:45 No.103313244

Anonymous 11/26/24(Tue)12:14:45 No.103313244

>>103313220
Read more, it says it is mostly circumvented by doing it in a way than people have been doing for a year.

Anonymous
11/26/24(Tue)12:14:49 No.103313248

Anonymous 11/26/24(Tue)12:14:49 No.103313248

>>103313177
llama.cpp server seems to be lower latency and supports all the fun samplers. No reason to use all the derivatives when it's that functional, unless they have a killer feature you need.

Anonymous
11/26/24(Tue)12:15:43 No.103313266

Anonymous 11/26/24(Tue)12:15:43 No.103313266

>>103313244
>mostly
never going to touch a l(obotomized)ora again, sorry

Anonymous
11/26/24(Tue)12:16:32 No.103313276

Anonymous 11/26/24(Tue)12:16:32 No.103313276

Hatsune Miku is the shittiest waifu there is and I am tired of pretending otherwise.

Anonymous
11/26/24(Tue)12:17:50 No.103313289

Anonymous 11/26/24(Tue)12:17:50 No.103313289

So can use llama 1B as the draft model for any llama model like lulu? Can it also be a quant or does it have to be full precision?
I'm working with 44GB memory total so it's hard to fit a decent 70B quant and draft at the same time.

Anonymous
11/26/24(Tue)12:17:56 No.103313291

Anonymous 11/26/24(Tue)12:17:56 No.103313291

File: saintmakise.jpg (236 KB, 1614x992)

236 KB JPG

Anonymous
11/26/24(Tue)12:19:20 No.103313310

Anonymous 11/26/24(Tue)12:19:20 No.103313310

https://www.reddit.com/r/LocalLLaMA/comments/1h0ckut/we_just_launched_sentient_a_completely_local/

>"The more you chat, the more the model improves. The training happens on the global model, so your interactions are contributing to the overall improvement of the model."

>The global model aggregation here refers to a technology called Federated Learning - wherein we don't take any data from the user but simply take the updated weights of the model after fine-tuning and aggregate them on a central server.

>So it's basically decentralised fine-tuning, powered by everyones data and secured by blockchain.

>This is very good! I find it interesting how it pulls data from linkedin to paint a clearer picture about you.

>That repo has been setup for v1.1 which will include auto-updates for future releases

>XD well, I'd appreciate it if you tried the demo seeing as how you've already downloaded it :)

>we're trying to make this tech accessible to even non-technical people. That's why we ship with all binaries and dependencies packaged into our installer

Anonymous
11/26/24(Tue)12:19:27 No.103313312

Anonymous 11/26/24(Tue)12:19:27 No.103313312

File: 2024-10-09_055755_seed846(...).png (3.25 MB, 1536x1536)

3.25 MB PNG

>>103312983
Ah, tuesday, yes. Ohio.

Anonymous
11/26/24(Tue)12:19:34 No.103313313

Anonymous 11/26/24(Tue)12:19:34 No.103313313

>>103313114
>Even at high adapter ranks and with rank stabilization, we find across layers that the effective rank of LoRA updates is less than half that of full fine-tuning and a quarter of the adapter rank. For example, with the high rank of r=768 for RoBERTa, LoRA updates have an average effective rank of 300. This suggests that LoRA is under utilizing its full capacity r, and may help explain observed gaps between LoRA and full fine-tuning on challenging tasks like coding
Damn, I guess it's over for vramlet fine-tuners.

Anonymous
11/26/24(Tue)12:19:48 No.103313314

Anonymous 11/26/24(Tue)12:19:48 No.103313314

>>103313266
By most its like 99.99% but sure. Never take another step because there is a greater chance than that to trip and die from it.

Anonymous
11/26/24(Tue)12:19:48 No.103313315

Anonymous 11/26/24(Tue)12:19:48 No.103313315

>>103313177
I switched from kcpp to llama-server several months ago and it's been great.
I'm pretty sure I get slightly better performance to, although it's been a while since I last benchmarked that.

Anonymous
11/26/24(Tue)12:20:05 No.103313320

Anonymous 11/26/24(Tue)12:20:05 No.103313320

>>103313243
>vision and drive to keep it from irrelevancy
Dead hobby. The corpse of this hobby is a vehicle for troons posting their green haired autogynephilic icon.

Anonymous
11/26/24(Tue)12:20:44 No.103313328

Anonymous 11/26/24(Tue)12:20:44 No.103313328

Is this "speculative decoding" thing for some models or all models? People in llama.cpp pull #10455 are only mentioning Qwen.

Anonymous
11/26/24(Tue)12:21:32 No.103313336

Anonymous 11/26/24(Tue)12:21:32 No.103313336

>>103313328
All models as long as you can get a smaller model with the same tokenizer and vocab as the main one, from what I understand.

Anonymous
11/26/24(Tue)12:21:37 No.103313339

Anonymous 11/26/24(Tue)12:21:37 No.103313339

>>103313310
>pulls data from your linkedin profile to construct a knowledge graph about you

Anonymous
11/26/24(Tue)12:21:39 No.103313340

Anonymous 11/26/24(Tue)12:21:39 No.103313340

>>103313328
You can use any model as long as it has multiple parameter count variants with the same tokenizer.

Anonymous
11/26/24(Tue)12:21:52 No.103313343

Anonymous 11/26/24(Tue)12:21:52 No.103313343

File: 1701779607563578.png (1.74 MB, 1188x712)

1.74 MB PNG

Omgg

Anonymous
11/26/24(Tue)12:21:53 No.103313345

Anonymous 11/26/24(Tue)12:21:53 No.103313345

>>103313248
>unless they have a killer feature you need
I like kcpp's slop list "ban", even if it's not perfect... But I can also say that trying lcpp does give me different results for the same model, not sure yet if they're better though.

Anonymous
11/26/24(Tue)12:22:00 No.103313348

Anonymous 11/26/24(Tue)12:22:00 No.103313348

>>103313310
>they hooked up a 3b model to a vector db
stop the fucking presses

Anonymous
11/26/24(Tue)12:22:33 No.103313354

Anonymous 11/26/24(Tue)12:22:33 No.103313354

>>103313289
>So can use llama 1B as the draft model for any llama model like lulu?
As long as the tokenizer is the same.
>Can it also be a quant or does it have to be full precision?
You can quant it. As usual, probably anything down to Q4 should be fine.

Anonymous
11/26/24(Tue)12:23:25 No.103313365

Anonymous 11/26/24(Tue)12:23:25 No.103313365

>>103313328
It works with all models but you need to have a small model that's 'similar' to the big model you want to run to mitigate the loss in quality you get from having the big model rewrite the dumber gens the small model does.

Anonymous
11/26/24(Tue)12:24:10 No.103313377

Anonymous 11/26/24(Tue)12:24:10 No.103313377

LMG... I... kneel

Anonymous
11/26/24(Tue)12:25:12 No.103313387

Anonymous 11/26/24(Tue)12:25:12 No.103313387

>>103313348
>>103313339
>>103313310
Engage backpedal
>FL is also just something we're researching rn - it may never exist in future versions, we just wanted to put it on the site to see what people thought of the idea

Anonymous
11/26/24(Tue)12:30:45 No.103313440

Anonymous 11/26/24(Tue)12:30:45 No.103313440

File: file.png (33 KB, 893x327)

33 KB PNG

>>103313365
https:/www..reddit.com/r/LocalLLaMA/comments/1gzm93o/speculative_decoding_just_landed_in_llamacpps/
What is the "previous speed"?
Spec decoding faster than the big model alone? Small model? Between the speed of small/big models? Sorry if dumb question, I just crawled out of a rock today.

Anonymous
11/26/24(Tue)12:31:16 No.103313447

Anonymous 11/26/24(Tue)12:31:16 No.103313447

>>103313314
Also saving like 90% compute/memory to get 80% of the effect is completely worthless and no one should ever do that

Anonymous
11/26/24(Tue)12:31:49 No.103313453

Anonymous 11/26/24(Tue)12:31:49 No.103313453

>>103313365
>mitigate the loss in quality you get from having the big model rewrite the dumber gens the small model does.
that's not how it works retard

Anonymous
11/26/24(Tue)12:32:24 No.103313460

Anonymous 11/26/24(Tue)12:32:24 No.103313460

>>103313440
Faster than the big model alone, but obviously slower than the small one

Anonymous
11/26/24(Tue)12:33:18 No.103313463

Anonymous 11/26/24(Tue)12:33:18 No.103313463

File: 1724855669831409.png (6 KB, 340x153)

6 KB PNG

>>103313453

Anonymous
11/26/24(Tue)12:35:55 No.103313484

Anonymous 11/26/24(Tue)12:35:55 No.103313484

>>103313387
It gets better
>FL has a lot of cool stuff we can implement like differential privacy but our end goal is to eliminate the server hosting the global model and go for full-blown blockchained federated learning

>all training will happen on your pc, so your data stays on your pc - it's just the model weights that will be aggregated on the blockchain

>again, just an experimental feature we are developing internally - it's not in the app right now and won't be there in the next few versions either

>all training will happen on your pc, so your data stays on your pc - it's just the model weights that will be aggregated on the blockchain
You do the tuning on your pc! Yay local win, then we get the benefits of a better model, made with your compute!

Anonymous
11/26/24(Tue)12:38:44 No.103313507

Anonymous 11/26/24(Tue)12:38:44 No.103313507

Autoround quantization looks promising:
https://www.reddit.com/r/LocalLLaMA/comments/1h0aev6/lossless_4bit_quantization_for_large_models_are/

Anonymous
11/26/24(Tue)12:40:24 No.103313528

Anonymous 11/26/24(Tue)12:40:24 No.103313528

https://x.com/kimmonismus/status/1861440503864049800
SORA GOT LEAKED, I REPEAT THIS ISNT A DRILL, SORA GOT LEAKED

Anonymous
11/26/24(Tue)12:40:47 No.103313534

Anonymous 11/26/24(Tue)12:40:47 No.103313534

File: __m4a1_and_persica_girls_(...).jpg (130 KB, 850x1133)

130 KB JPG

>>103313336
>>103313340
Why the fuck is that so? You just need to tokenize the small model's output with the large model's tokenizer, problem solved

Anonymous
11/26/24(Tue)12:41:44 No.103313546

Anonymous 11/26/24(Tue)12:41:44 No.103313546

>>103313507
>lossless
There's obviously some loss.

Anonymous
11/26/24(Tue)12:41:58 No.103313551

Anonymous 11/26/24(Tue)12:41:58 No.103313551

>>103313534
Correct, that just wasn't implemented yet. You are free to make this happen though.

Anonymous
11/26/24(Tue)12:42:09 No.103313555

Anonymous 11/26/24(Tue)12:42:09 No.103313555

>>103313546
Wrong

Anonymous
11/26/24(Tue)12:42:20 No.103313559

Anonymous 11/26/24(Tue)12:42:20 No.103313559

>>103313528
Who gives a fuck? It's shit that nobody wants anyway.

Anonymous
11/26/24(Tue)12:42:36 No.103313563

Anonymous 11/26/24(Tue)12:42:36 No.103313563

>>103313528
>OAI jews out so hard the artists leak it
Poetry.
So when are we getting the claude leak(s)?

Anonymous
11/26/24(Tue)12:43:08 No.103313570

Anonymous 11/26/24(Tue)12:43:08 No.103313570

>>103313528
Fuck your twitter link. Post the magnet or fuck off

Anonymous
11/26/24(Tue)12:43:19 No.103313572

Anonymous 11/26/24(Tue)12:43:19 No.103313572

>>103313528
>simple openai api calling HF space
>>>leaked
Ai grifters losing creativity.

Anonymous
11/26/24(Tue)12:43:23 No.103313573

Anonymous 11/26/24(Tue)12:43:23 No.103313573

>>103313528
downloading it right now, hope it runs on 72gb

Anonymous
11/26/24(Tue)12:43:55 No.103313576

Anonymous 11/26/24(Tue)12:43:55 No.103313576

>>103313528
I bet this is just an API link and the artists retards don't even know they can just take the API link down.

Anonymous
11/26/24(Tue)12:44:05 No.103313579

Anonymous 11/26/24(Tue)12:44:05 No.103313579

>>103313528
>>>/g/ldg

Anonymous
11/26/24(Tue)12:44:23 No.103313582

Anonymous 11/26/24(Tue)12:44:23 No.103313582

>>103313555
It's right there in the graph.

Anonymous
11/26/24(Tue)12:44:24 No.103313583

Anonymous 11/26/24(Tue)12:44:24 No.103313583

>>103313528
>>103313563
No weights

>>103313555
4bit: 82.98
full: 83.44
That's not lossless

Anonymous
11/26/24(Tue)12:45:15 No.103313594

Anonymous 11/26/24(Tue)12:45:15 No.103313594

>>103313583
That's minuscule.

Anonymous
11/26/24(Tue)12:45:16 No.103313595

Anonymous 11/26/24(Tue)12:45:16 No.103313595

>>103313528
https://huggingface.co/spaces/PR-Puppets/PR-Puppet-Sora
Am I missing something or this just a demo with an API proxy and no weights were linked?
If anything this looks like a stealth marketing campaign like strawberry and reflection. Fuck this shit.

Anonymous
11/26/24(Tue)12:45:21 No.103313598

Anonymous 11/26/24(Tue)12:45:21 No.103313598

>>103313463
You don't get a loss in quality though, it's at most a loss in performance.

Anonymous
11/26/24(Tue)12:45:28 No.103313599

Anonymous 11/26/24(Tue)12:45:28 No.103313599

>>103313528
>no weights
Fuck off.

Anonymous
11/26/24(Tue)12:45:35 No.103313601

Anonymous 11/26/24(Tue)12:45:35 No.103313601

>>103313594
Still a loss.

Anonymous
11/26/24(Tue)12:47:26 No.103313618

Anonymous 11/26/24(Tue)12:47:26 No.103313618

>>103313440
man, I can't get this shit to work on my potato. Oobabooga works though, loads models no problem.

Anonymous
11/26/24(Tue)12:47:31 No.103313619

Anonymous 11/26/24(Tue)12:47:31 No.103313619

>>103313595
>>103313583
>>103313576
>>103313572
KEEEEK
this proves once and for all that these morons barely have a single working brain cell.

Anonymous
11/26/24(Tue)12:47:38 No.103313622

Anonymous 11/26/24(Tue)12:47:38 No.103313622

>>103313598
Is there literally any prove of this? How does a small model 'draft' something in the way that it doesn't hurt the big model? Sounds like the 5000th edition of AI snake oil.

Anonymous
11/26/24(Tue)12:48:35 No.103313628

Anonymous 11/26/24(Tue)12:48:35 No.103313628

>>103313619
>these morons
which morons are we talking about? there are so many...

Anonymous
11/26/24(Tue)12:49:29 No.103313640

Anonymous 11/26/24(Tue)12:49:29 No.103313640

>>103313528
why did you even share this? you're behind /ldg/ by like an hour retard
no weights, so it isnt even a leak. probably a really shitty PR stunt.

Anonymous
11/26/24(Tue)12:50:32 No.103313653

Anonymous 11/26/24(Tue)12:50:32 No.103313653

>>103313640
>probably a really shitty PR stunt.
>PR Puppets
guaranteed

Anonymous
11/26/24(Tue)12:50:58 No.103313658

Anonymous 11/26/24(Tue)12:50:58 No.103313658

>>103313622
It's simple, when you use a draft model you can check if the tokens are correct in parallel, but when you're generating without a draft, you have to generate tokens in sequence. That's it, and that's why there's no loss in quality.

Anonymous
11/26/24(Tue)12:54:14 No.103313693

Anonymous 11/26/24(Tue)12:54:14 No.103313693

>>103313658
But what if the small model has a fundamentally different idea of what is correct due to being a tiny retarded model?
If there somehow is no loss in quality, then the output of running a big model with/without draft must surely be exactly the same in a deterministic environment, correct?

Anonymous
11/26/24(Tue)12:54:18 No.103313696

Anonymous 11/26/24(Tue)12:54:18 No.103313696

>>103313622
Some people said it's like speculative execution on the cpu
I don't know how gpu speculative decoding works, but on the cpu side it basically allows the cpu to ignore a bunch of conditional checks and pretend that the most likely outcome is the current outcome. Obviously, if the cpu later realizes that it fucked up, it rolls back the changes, but modern cpus are extremely good at predicting those branches as programs are generally fairly repetitive - maybe language is the same, if the amount of gpt slop is anything to go by

Anonymous
11/26/24(Tue)12:56:04 No.103313710

Anonymous 11/26/24(Tue)12:56:04 No.103313710

Largestral V3 - bench cooked, boner killing schizoshit.
Tulu-3 - NFP, no investors to impress with meme marks, pure dick-ruining smut kino at just over half the size
Yeah I'm thinking benchmarks are killing LLMs.

Anonymous
11/26/24(Tue)12:56:49 No.103313717

Anonymous 11/26/24(Tue)12:56:49 No.103313717

>>103313710
As proven by the fact that we still don't have a better model than Midnight Miqu after a year, yes.

Anonymous
11/26/24(Tue)12:56:57 No.103313718

Anonymous 11/26/24(Tue)12:56:57 No.103313718

File: 1715830787598652.png (336 KB, 3000x2100)

336 KB PNG

>>103313507
Doesn't seem to beat regular quant methods?

Anonymous
11/26/24(Tue)12:57:06 No.103313720

Anonymous 11/26/24(Tue)12:57:06 No.103313720

>>103313693
Yes, it should be exactly the same except for differences caused by rounding errors, as CUDA Dev said last thread: >>103303718

Anonymous
11/26/24(Tue)12:57:07 No.103313722

Anonymous 11/26/24(Tue)12:57:07 No.103313722

>>103313693
As long as the implementation is done correctly, there can't be a quality loss, as the bad predictions from the smaller model are just disregarded.

Anonymous
11/26/24(Tue)12:57:42 No.103313732

Anonymous 11/26/24(Tue)12:57:42 No.103313732

>>103313710
If only Mistral still released base models, we could have eventually had a Tulu Largestral.

Anonymous
11/26/24(Tue)12:58:30 No.103313739

Anonymous 11/26/24(Tue)12:58:30 No.103313739

>keep getting ugly as shit faces
>wondering why when i haven't had this issue before
>finally hits me
>they look kind like One Piece characters
>"one piece dress"

Anonymous
11/26/24(Tue)12:59:11 No.103313747

Anonymous 11/26/24(Tue)12:59:11 No.103313747

Tulu verdict?

Anonymous
11/26/24(Tue)12:59:29 No.103313749

Anonymous 11/26/24(Tue)12:59:29 No.103313749

>>103313310
>screenshot
>Your name is Varad Deshpande, an aspiring Full-Stack Web Developer
>You often use informal language and colloquial expressions (e.g., "bhai" instead of "brother").

Anonymous
11/26/24(Tue)12:59:43 No.103313752

Anonymous 11/26/24(Tue)12:59:43 No.103313752

teto save me

Anonymous
11/26/24(Tue)12:59:58 No.103313755

Anonymous 11/26/24(Tue)12:59:58 No.103313755

>>103313747
easy winner for the meme model of the week award

Anonymous
11/26/24(Tue)13:00:43 No.103313765

Anonymous 11/26/24(Tue)13:00:43 No.103313765

>>103313739
kek

Anonymous
11/26/24(Tue)13:00:54 No.103313769

Anonymous 11/26/24(Tue)13:00:54 No.103313769

>>103313747
not as good for characterization as other models... also a bit of a turn stealer in RP. But the smut it produces is out of this world.

Anonymous
11/26/24(Tue)13:01:20 No.103313773

Anonymous 11/26/24(Tue)13:01:20 No.103313773

>>103313747
Made the few finetooners left itt shit their pants and now they're doing PR against it.

Anonymous
11/26/24(Tue)13:02:04 No.103313784

Anonymous 11/26/24(Tue)13:02:04 No.103313784

>>103313696
>Some people said it's like speculative execution on the cpu
This is sort of correct. The way it works is like this:
1. Small model generates a sequence of tokens for the current context (as if the CPU ignored branches and continued executing)
2. Big model checks if each token of the sequence is something it would generate by itself, in parallel. (As if the CPU is checking each branch to make sure that the prediction was correct)
3. If any of the generated tokens don't match what it would generate, the generation ignored the current token and the remaining tokens. (As of the CPU rolled back to a previous branch)

Anonymous
11/26/24(Tue)13:02:10 No.103313787

Anonymous 11/26/24(Tue)13:02:10 No.103313787

>>103313747
Not that impressive considering what we have now, but it's impressive that they managed to unfuck llama 3.1

Anonymous
11/26/24(Tue)13:03:50 No.103313802

Anonymous 11/26/24(Tue)13:03:50 No.103313802

>>103313787
That was first done by Nemotron though.

Anonymous
11/26/24(Tue)13:05:55 No.103313822

Anonymous 11/26/24(Tue)13:05:55 No.103313822

>>103313747
Its super liberal with EOS tokens.
You often need several gens in rapid succession to get all the things the model was instructed to do. I can see some potential here for game engines since it appears fairly consistent so far in some of my larger experimental RPG simulator cards.

Anonymous
11/26/24(Tue)13:07:16 No.103313835

Anonymous 11/26/24(Tue)13:07:16 No.103313835

File: Event71.jpg (56 KB, 800x600)

56 KB JPG

>>103313710
We just need some benchmarks that are relevant to ERP. You know, talking during deepthroating, sparks behind closed eyes, sucking your dick while being fucked by you, looking into your eyes during a rim job, I could continue forever. Though it shouldn't be this straightforward in the public examples, it could be replaced with something SFW, but private tests must be run with hardcore smut.

Anonymous
11/26/24(Tue)13:07:37 No.103313839

Anonymous 11/26/24(Tue)13:07:37 No.103313839

im switching to llamacpp from kobold but

how do i automate the settings from kobold so i dont have to keep making batch files for all the settings from scratch every time i want to launch models?
stupid question but this is why we (i) use kobold in the first place, just making the basic .bat to run llama has already proven to me kobold must be brutally raping every model because i had a real nice time talking to the generic assistant of the model im trying with nothing changed

Anonymous
11/26/24(Tue)13:07:55 No.103313841

Anonymous 11/26/24(Tue)13:07:55 No.103313841

>>103313177
Don't fix what's not broken

Anonymous
11/26/24(Tue)13:08:38 No.103313853

Anonymous 11/26/24(Tue)13:08:38 No.103313853

>>103313822
>Its super liberal with EOS tokens.
I had the opposite issue. It wants to write like a thousand tokens before giving me a turn in RP.
My guess is you buggered up your prompt template.
It goes from bos to <|system|> to <|user|> to <|assistant|> and then the eos goes at the end of the assistant message. If you eos/eot between the steps you are introducing superfluous stops that it might be confused by.

Anonymous
11/26/24(Tue)13:10:51 No.103313868

Anonymous 11/26/24(Tue)13:10:51 No.103313868

>>103313853
on this note I don't get why the fuck they are trying to create a new proprietary prompt format. Just use Llama-3 native ffs.

Anonymous
11/26/24(Tue)13:11:05 No.103313871

Anonymous 11/26/24(Tue)13:11:05 No.103313871

>>103313747
Its surpassed nemotron for RP now. Its dirty and creative and actually knows how to advance a plot unlike most other models

Anonymous
11/26/24(Tue)13:13:04 No.103313883

Anonymous 11/26/24(Tue)13:13:04 No.103313883

>>103313853
isn't the hf conversion script supposed to pull all that crap from the json files into the gguf?

Anonymous
11/26/24(Tue)13:13:29 No.103313890

Anonymous 11/26/24(Tue)13:13:29 No.103313890

>>103313868
It's not new, I think that's the prompt format used by Phi-3.

Anonymous
11/26/24(Tue)13:14:57 No.103313905

Anonymous 11/26/24(Tue)13:14:57 No.103313905

>>103313883
It would make too much sense to have a standard. These can be wrong (and I think they are in this case). For example, mistral models also have fucked template and company recommends their python lib to do tokenization.

Anonymous
11/26/24(Tue)13:15:48 No.103313911

Anonymous 11/26/24(Tue)13:15:48 No.103313911

Thanks to the anon who made me check llama.cpp out, it's actually much more fun to use for RP than koboldcpp. I guess it was stupid of me to trust a project maintained by a single guy.

Anonymous
11/26/24(Tue)13:16:42 No.103313917

Anonymous 11/26/24(Tue)13:16:42 No.103313917

>>103313890
Kind of. I just looked it up
It's the same tokens as Phi-3 except it removes the end token between steps.

Anonymous
11/26/24(Tue)13:17:43 No.103313927

Anonymous 11/26/24(Tue)13:17:43 No.103313927

>>103313917
Oh and it uses the Llama-3 <|begin_of_text|> token as a bos

Anonymous
11/26/24(Tue)13:20:49 No.103313950

Anonymous 11/26/24(Tue)13:20:49 No.103313950

>>103313890
>>103313917
Phi uses <|end|> not <|endoftoken|>
Why the fuck don't they just use Phi tags if not Llama 3?

Anonymous
11/26/24(Tue)13:21:58 No.103313959

Anonymous 11/26/24(Tue)13:21:58 No.103313959

endoftext I mean

Anonymous
11/26/24(Tue)13:21:58 No.103313960

Anonymous 11/26/24(Tue)13:21:58 No.103313960

>>103313950
Maybe the key to making a better LLM is having a clusterfuck of a half-assed prompt format copied from multiple sources.

Anonymous
11/26/24(Tue)13:23:25 No.103313969

Anonymous 11/26/24(Tue)13:23:25 No.103313969

>>103313950
I have to assume they did testing and went with that worked best.

Anonymous
11/26/24(Tue)13:24:22 No.103313980

Anonymous 11/26/24(Tue)13:24:22 No.103313980

>>103313969
KEK

Anonymous
11/26/24(Tue)13:24:54 No.103313989

Anonymous 11/26/24(Tue)13:24:54 No.103313989

>>103313969
I mean I've always held that pretending the model is doing anything but just completing text is kind of dumb. So by removing an end of turn indicator between steps they are kind of removing some of the retarded fluff.

Anonymous
11/26/24(Tue)13:25:57 No.103314009

Anonymous 11/26/24(Tue)13:25:57 No.103314009

>>103313980
They put out the best vison and now the best "assistant" tune so im going to assume they know what they are doing compared to local shittuners.

Anonymous
11/26/24(Tue)13:26:46 No.103314018

Anonymous 11/26/24(Tue)13:26:46 No.103314018

File: just in case doesn't post(...).png (7 KB, 452x135)

7 KB PNG

>>103313969
boolshit
on the other hand, I notice deepseek v2.5 also only has end token for assistant turn
<|beginofsentence|>{system_message}<|User|>{user_message_1}<|Assistant|>{assistant_message_1}<|endofsentence|><|User|>{user_message_2}<|Assistant|>

Anonymous
11/26/24(Tue)13:28:33 No.103314045

Anonymous 11/26/24(Tue)13:28:33 No.103314045

>>103313969
Regardless of the text representation, it's still a single token. You can even rename it in the tokenizer configuration and use the new one.

Anonymous
11/26/24(Tue)13:28:43 No.103314047

Anonymous 11/26/24(Tue)13:28:43 No.103314047

>>103314018
So they copied that part from DeepSeek.
So we have the Llama-3 BOS token, the Phi-3 header tokens, and the DeepSeek format, essentially.

Anonymous
11/26/24(Tue)13:28:45 No.103314048

Anonymous 11/26/24(Tue)13:28:45 No.103314048

>>103314018
DeepSeek's format wins the award of the shittiest prompt format.

Anonymous
11/26/24(Tue)13:31:33 No.103314097

Anonymous 11/26/24(Tue)13:31:33 No.103314097

File: 1731790602582178.jpg (264 KB, 1861x1408)

264 KB JPG

I feel a strong sense of déjà vu in this thread. Every damn time.

Anonymous
11/26/24(Tue)13:32:51 No.103314117

Anonymous 11/26/24(Tue)13:32:51 No.103314117

>>103313710
noooo but largestral is good, if it's not good then what did I spend $5T on 5000000 H100s for?!?! It MUST be good

Anonymous
11/26/24(Tue)13:34:23 No.103314138

Anonymous 11/26/24(Tue)13:34:23 No.103314138

best draft model for largestral 2 speculative decoding?

Mistral-7B-Instruct-v0.3-GGUF
or
Ministral-3b-instruct

Anonymous
11/26/24(Tue)13:35:11 No.103314144

Anonymous 11/26/24(Tue)13:35:11 No.103314144

>>103314117
Isn't the new Largestral simply a cheap fine-tuning of the exact same base model?

Anonymous
11/26/24(Tue)13:35:34 No.103314150

Anonymous 11/26/24(Tue)13:35:34 No.103314150

>>103313839
I used to use .bat launcher for llamacpp, but nowadays I just make use of the terminal history and swap out parameters as needed
I don't try a new model every 2 seconds so if I want to use a model I haven't used in a few days, I can just hit the up arrow a few times

Anonymous
11/26/24(Tue)13:36:38 No.103314165

Anonymous 11/26/24(Tue)13:36:38 No.103314165

>>103314117
It's good according to the benchmarks and that's all the investors care about, and that's basically the whole grift, really.
Mostly money from directed retirement savings.
Boomers:
>That them newfangled AI stuff is the future. I should move some of my RRSP funds into there.
And so the fund managers don't really care whether the models know whether you can talk while a dick is tickling your tonsils they just look at whichever AI has the best meme marks and send it to that corpo.

Anonymous
11/26/24(Tue)13:38:26 No.103314187

Anonymous 11/26/24(Tue)13:38:26 No.103314187

>>103314138
nevermind, ministral 3 and 8b are new and better but llamacpp doesnt support them yet
https://github.com/ggerganov/llama.cpp/issues/9914

Anonymous
11/26/24(Tue)13:39:26 No.103314204

Anonymous 11/26/24(Tue)13:39:26 No.103314204

>>103314117
2nd is sota, 3rd they fucked up something majorly somewhere

Anonymous
11/26/24(Tue)13:40:44 No.103314215

Anonymous 11/26/24(Tue)13:40:44 No.103314215

>>103314048
*openchats in your path*
>GPT4 Correct User: Hello<|end_of_turn|>GPT4 Correct Assistant: Hi<|end_of_turn|>GPT4 Correct User: How are you today?<|end_of_turn|>GPT4 Correct Assistant:

Anonymous
11/26/24(Tue)13:41:27 No.103314222

Anonymous 11/26/24(Tue)13:41:27 No.103314222

>>103314138
Mistral-7B-Instruct-v0.3 uses the same tokenizer as Mistral Large 2407, so there are no other options here. Ministral uses a different tokenizer and 2411 has no matching small models at all

Anonymous
11/26/24(Tue)13:42:05 No.103314228

Anonymous 11/26/24(Tue)13:42:05 No.103314228

>>103314204
I felt like it was an improvement overall. Although you have to nail the prompt template down (a single whitespace or lack there of makes the result widely different).

Anonymous
11/26/24(Tue)13:43:27 No.103314249

Anonymous 11/26/24(Tue)13:43:27 No.103314249

>>103314187
>ministral 3
>>103314138
>Ministral-3b-instruct
The one on hf isn't the one you think it is
https://huggingface.co/ministral/Ministral-3b-instruct/tree/main
>8 months ago
>Finetuned from model: mistralai/Mistral-7B-v0.1

Anonymous
11/26/24(Tue)13:44:55 No.103314269

Anonymous 11/26/24(Tue)13:44:55 No.103314269

>>103314150
this shit makes my brain go numb, how do i add the ctx line properly? trying to give it 32768 but it keeps telling me im doing it wrong

Anonymous
11/26/24(Tue)13:46:27 No.103314289

Anonymous 11/26/24(Tue)13:46:27 No.103314289

>>103314269
show your command maybe?

Anonymous
11/26/24(Tue)13:51:24 No.103314330

Anonymous 11/26/24(Tue)13:51:24 No.103314330

>>103314289
i got this shit to run by removing the "-ngl N," part of the command (how was i meant to know i shouldnt do that?") but it still didn't pass the 30 layers, even if in the command line it showed it was
anyway here's my .bat i give up, back to kobold kek
llama-server -m L3.1-Dark-Planet-SpinFire-Uncensored-8B-D_AU-Q6_k.gguf --port 8080 --n-gpu-layers 30

Anonymous
11/26/24(Tue)14:04:40 No.103314467

Anonymous 11/26/24(Tue)14:04:40 No.103314467

>run llama.cpp server
>try to talk with any character
>nothing happens
huh? did anyone else have this problem?

Anonymous
11/26/24(Tue)14:06:17 No.103314479

Anonymous 11/26/24(Tue)14:06:17 No.103314479

As expected, because Allen AI just had to use a snowflake prompt format, it does not merge nicely with other models. Sad. Meaning you'll have to set up a tutoring pipeline if you want to distill any of that delicious smut on another model.

Anonymous
11/26/24(Tue)14:06:40 No.103314483

Anonymous 11/26/24(Tue)14:06:40 No.103314483

>>103314467
only you as far as I can tell

Anonymous
11/26/24(Tue)14:14:33 No.103314566

Anonymous 11/26/24(Tue)14:14:33 No.103314566

just tried tulu 3 at Q8 and it's retarded, extremely poor logic and commonsense in story prose
why do you guys consistently have the WORST, dumbest fucking taste and why do I still bother to listen to any of you

Anonymous
11/26/24(Tue)14:15:52 No.103314581

Anonymous 11/26/24(Tue)14:15:52 No.103314581

>>103314566
Post log or be known as a liar. It did rpg cards and non human smut with blazing colors for me.

Anonymous
11/26/24(Tue)14:17:00 No.103314595

Anonymous 11/26/24(Tue)14:17:00 No.103314595

Holy shit cpu bros we are so back!
>qwen 32b
>old llama-server 2t/s
>updated llama-server 1.75t/s
>updated llama-server with drafting 3t/s

Anonymous
11/26/24(Tue)14:18:26 No.103314611

Anonymous 11/26/24(Tue)14:18:26 No.103314611

>>103314222
Did Mistral 7B w/ Mistral Large 2407 work for you? I tried earlier but got a tokenizer mismatch error. I remember someone in the last thread mentioning a patch for this but haven't seen any info about that anywhere

Anonymous
11/26/24(Tue)14:18:53 No.103314613

Anonymous 11/26/24(Tue)14:18:53 No.103314613

running

llama-speculative.exe --model "Mistral-Large-Instruct-2407-Q4_K_S-00001-of-00002.gguf" -md "Mistral-7B-Instruct-v0.3-Q4_K_S.gguf"

gives me error

"main: draft model vocab must match target model to use speculation but token 10 content differs - target '[IMG]', draft '[control_8]'"

both models downloaded from bartowski

do i need mistral 7b 0.2 instead of 0.3?

Anonymous
11/26/24(Tue)14:19:19 No.103314619

Anonymous 11/26/24(Tue)14:19:19 No.103314619

>>103314581
nta, but i'm curious as to what that rpg card looks like
i think i've seen(you) or someone else mention it previously

Anonymous
11/26/24(Tue)14:20:31 No.103314635

Anonymous 11/26/24(Tue)14:20:31 No.103314635

>>103314581
bullshit, FUCK you
you're one of those people (and so I bet is everyone shilling this model) who, when his dick gets hard, just stops noticing when the model outputs basic logical errors or non sequiturs or describes biologically impossible body positions

Anonymous
11/26/24(Tue)14:20:39 No.103314636

Anonymous 11/26/24(Tue)14:20:39 No.103314636

>>103314479
There's no need for a "tutoring pipeline", retard. They open sourced their dataset.

Anonymous
11/26/24(Tue)14:20:40 No.103314637

Anonymous 11/26/24(Tue)14:20:40 No.103314637

>>103314619
https://rentry.org/CharacterProvider-CYOARPG

Anonymous
11/26/24(Tue)14:20:48 No.103314640

Anonymous 11/26/24(Tue)14:20:48 No.103314640

>>103314566
There's maybe 2 or 3 active posters here that could run that at Q8 and I'm one of them. So I can call your bullshit with 99% certainty.

Anonymous
11/26/24(Tue)14:21:17 No.103314646

Anonymous 11/26/24(Tue)14:21:17 No.103314646

File: no homo, yo.jpg (193 KB, 860x1290)

193 KB JPG

>>103312983

Anonymous
11/26/24(Tue)14:21:32 No.103314650

Anonymous 11/26/24(Tue)14:21:32 No.103314650

>>103314611
I don't have enough VRAM for the draft model with TP. If only the draft model could be split across GPUs...

Anonymous
11/26/24(Tue)14:21:59 No.103314654

Anonymous 11/26/24(Tue)14:21:59 No.103314654

>>103314595
>>old llama-server 2t/s
>>updated llama-server 1.75t/s
the fuck happened there?

Anonymous
11/26/24(Tue)14:23:10 No.103314664

Anonymous 11/26/24(Tue)14:23:10 No.103314664

>>103314640
>what is cpu offloading
moron

Anonymous
11/26/24(Tue)14:23:54 No.103314670

Anonymous 11/26/24(Tue)14:23:54 No.103314670

>>103314635
Nope, my first tests with this model were the usual intelligence tests, only after did I start doing erp. I think your the anon that just tries to discourage all talk of models. Are you even using its correct formatting?

Anonymous
11/26/24(Tue)14:26:16 No.103314685

Anonymous 11/26/24(Tue)14:26:16 No.103314685

>>103314670
NTA but models that fall apart without the correct formatting are retarded. That's a sign of extreme brittleness, smart ones don't care because they can generalize.

Anonymous
11/26/24(Tue)14:27:28 No.103314699

Anonymous 11/26/24(Tue)14:27:28 No.103314699

>>103314685
>models that fall apart without the correct formatting are retarded
So qwen2.5, llama3.1, mistral large, gemma2 are all retarded? Huh, the more you know.
Stop talking about shit you clearly have no clue on. These models predict the next token.

Anonymous
11/26/24(Tue)14:27:49 No.103314702

Anonymous 11/26/24(Tue)14:27:49 No.103314702

Anyway I sincerely believe this person is mentally ill and is a danger to themselves and others and that the mods here are terrible people for providing them with a platform. I'm not going to contribute to that anymore. Ciao. It's been a slice.

Anonymous
11/26/24(Tue)14:29:22 No.103314716

Anonymous 11/26/24(Tue)14:29:22 No.103314716

>>103314483
I found the cause, llama.cpp server had an error but didn't close, so I didn't notice it.

Anonymous
11/26/24(Tue)14:29:26 No.103314717

Anonymous 11/26/24(Tue)14:29:26 No.103314717

>>103314699
>These models predict the next token
Thanks for the reddit-tier insight faggot, you're really blowing my mind here. Clearly we've got an ML expert on our hands.

Anonymous
11/26/24(Tue)14:30:41 No.103314737

Anonymous 11/26/24(Tue)14:30:41 No.103314737

>>103314685
Usually, models fail the most when the formatting is only slightly off. You can use almost any other formatting and the model will figure it out, but don't you dare to miss a single space.

Anonymous
11/26/24(Tue)14:31:00 No.103314739

Anonymous 11/26/24(Tue)14:31:00 No.103314739

>>103314717
Claude and GPT are also retarded if you start formatting and feeding it stuff with formatting it was not trained on. Claude 3.5 will start forgetting who is who and make dumb logical errors. These models are not intelligent like you think they are retard.

Anonymous
11/26/24(Tue)14:31:13 No.103314742

Anonymous 11/26/24(Tue)14:31:13 No.103314742

>>103314650
It can be in another GPU as far as I could tell from the PR

Anonymous
11/26/24(Tue)14:31:54 No.103314748

Anonymous 11/26/24(Tue)14:31:54 No.103314748

>>103314685
nta, but why wouldn't you use the format the model was trained with?
clearly they are there for a reason

Anonymous
11/26/24(Tue)14:32:42 No.103314753

Anonymous 11/26/24(Tue)14:32:42 No.103314753

>>103314739
>ak-akshully the model isn't stupid, because when you think about it ALL llms are stupid
this is always the final cope of someone who's been recommending a stupid model

Anonymous
11/26/24(Tue)14:32:52 No.103314754

Anonymous 11/26/24(Tue)14:32:52 No.103314754

>>103314646
Sataana perkele

Anonymous
11/26/24(Tue)14:33:15 No.103314760

Anonymous 11/26/24(Tue)14:33:15 No.103314760

>>103314748
Because he is retarded and is trying to pretend its not his fault. He does not know how formatting works.

Anonymous
11/26/24(Tue)14:33:15 No.103314761

Anonymous 11/26/24(Tue)14:33:15 No.103314761

>>103314742
There are no options to split the draft model specifically/differently. You only have the usually --tensor-split which applies to both models.

Anonymous
11/26/24(Tue)14:33:28 No.103314766

Anonymous 11/26/24(Tue)14:33:28 No.103314766

>>103314742
I use tabby because llama.cpp performs poorly on 4 GPUs

Anonymous
11/26/24(Tue)14:37:22 No.103314789

Anonymous 11/26/24(Tue)14:37:22 No.103314789

>>103314760
Like I said I'm not that anon, I haven't even tried Tulu (nor am I going to). Take your meds please.

Anonymous
11/26/24(Tue)14:37:56 No.103314793

Anonymous 11/26/24(Tue)14:37:56 No.103314793

>>103314222
there is a single token difefrence, its not the exact same vocab >>103314613
>>103314222

what command to use to let llama-server know to ignore the vocab difference?

Anonymous
11/26/24(Tue)14:41:43 No.103314842

Anonymous 11/26/24(Tue)14:41:43 No.103314842

>>103314793
tabby/exllamav2 just works.

Anonymous
11/26/24(Tue)14:42:50 No.103314853

Anonymous 11/26/24(Tue)14:42:50 No.103314853

had to downgrade to 4bpw to test it

Anonymous
11/26/24(Tue)14:42:57 No.103314855

Anonymous 11/26/24(Tue)14:42:57 No.103314855

I might make a PR to fix this shit, I wonder if it will get merged

Anonymous
11/26/24(Tue)14:44:32 No.103314875

Anonymous 11/26/24(Tue)14:44:32 No.103314875

>>103313710
Tulu 8b is better than any other model I tried including several 70b.
I'm using rocinante v2 for story development and then use Tulu for the gigantic context length with coherence and for logic and complex situations. So far incredibly good. I'm gonma try tulu 70b now but I'm on a single 4090 unfortunately.

Anonymous
11/26/24(Tue)14:45:11 No.103314884

Anonymous 11/26/24(Tue)14:45:11 No.103314884

File: __hatsune_miku_vocaloid_d(...).jpg (66 KB, 850x627)

66 KB JPG

Anonymous
11/26/24(Tue)14:46:20 No.103314898

Anonymous 11/26/24(Tue)14:46:20 No.103314898

>>103314855
from my experience having my own PRs merged, it'll get grilled for overall utility and coding style and then merged fairly quickly if it doesn't cause any architectural issues.
Expect suggestions to your code and requests to also do things like update the --help output, possibly failing unit tests you'll have to fix on obscure build targets and other unknown-unknowns.

Anonymous
11/26/24(Tue)14:46:32 No.103314900

Anonymous 11/26/24(Tue)14:46:32 No.103314900

I would like to use a LLM to extract data from natural text. Is CPU inferencing usable on a v4 Xeon? I don't want to buy multiple GPUs(need redundancy for my usecase)

Anonymous
11/26/24(Tue)14:46:38 No.103314905

Anonymous 11/26/24(Tue)14:46:38 No.103314905

>8b is better than 70b
I see the thread is now entering the delusional mania/euphoria phase regarding this model series
Can't wait for the comedown and regret

Anonymous
11/26/24(Tue)14:49:39 No.103314937

Anonymous 11/26/24(Tue)14:49:39 No.103314937

Tulu 8b is better than 3.5 Sonnet. There, I said it

Anonymous
11/26/24(Tue)14:51:21 No.103314953

Anonymous 11/26/24(Tue)14:51:21 No.103314953

>>103314566
Damn, I was about to download. Thanks for the heads up.

Anonymous
11/26/24(Tue)14:52:23 No.103314964

Anonymous 11/26/24(Tue)14:52:23 No.103314964

>>103314937
Better at what exactly?

Anonymous
11/26/24(Tue)14:52:57 No.103314974

Anonymous 11/26/24(Tue)14:52:57 No.103314974

Can someone upload the proper ST formatting for Tulu?
I tried it after all the talk, but I'm still getting weird outputs no matter how much I fiddle with it.

Anonymous
11/26/24(Tue)14:53:50 No.103314985

Anonymous 11/26/24(Tue)14:53:50 No.103314985

>>103314566
>tulu 3 at Q8
8b or 70b?

Anonymous
11/26/24(Tue)14:54:37 No.103314997

Anonymous 11/26/24(Tue)14:54:37 No.103314997

>>103314974
https://files.catbox.moe/qvn0g3.json

Anonymous
11/26/24(Tue)14:55:39 No.103315006

Anonymous 11/26/24(Tue)14:55:39 No.103315006

>>103314985
Either 8B (which I have not tried) or he is spreading disinfo.

Anonymous
11/26/24(Tue)14:55:53 No.103315010

Anonymous 11/26/24(Tue)14:55:53 No.103315010

>>103314875
>Tulu for the gigantic context length
>Hyperparamters
>Max Token Length: 2,048
>Max Prompt Token Length: 2,048
>https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B
>Max Sequence Length: 2,048
>https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B-DPO
It was trained on just 2k though (for final and DPO)

SFT was
>Max. Sequence Length: 4096
>https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B-SFT

Anonymous
11/26/24(Tue)14:56:45 No.103315020

Anonymous 11/26/24(Tue)14:56:45 No.103315020

>>103315010
tulu shills in shambles

Anonymous
11/26/24(Tue)14:57:55 No.103315035

Anonymous 11/26/24(Tue)14:57:55 No.103315035

>>103315006
70B, faggot
I have 36vram (3090 + 3060) + 64gb ram which is not impressive at all but means Q8 70Bs can be offloaded easily
I can't even imagine what third world shithole you're living in that a couple of used consumer GPUs and $120 worth of ram seems like so much wealth to you that it must be made up

Anonymous
11/26/24(Tue)14:59:21 No.103315045

Anonymous 11/26/24(Tue)14:59:21 No.103315045

>>103314884
it's very cute, I like it

Anonymous
11/26/24(Tue)14:59:58 No.103315050

Anonymous 11/26/24(Tue)14:59:58 No.103315050

This tulu being shilled is really freaking good actually. First model that can do great smut without being too retarded for more complicated stuff. Usually I have to switch between qwen2.5 and Magnum v4 when stuff gets spicy.

Fuck the troll screaming it's bad, nearly didn't try it because of him.

Anonymous
11/26/24(Tue)15:03:27 No.103315088

Anonymous 11/26/24(Tue)15:03:27 No.103315088

Is tulu useful for things other than cooming? I'm not into rp.

Anonymous
11/26/24(Tue)15:04:13 No.103315098

Anonymous 11/26/24(Tue)15:04:13 No.103315098

>>103314793
>>103314613
>>103314611
posted issue at https://github.com/ggerganov/llama.cpp/issues/10529

Anonymous
11/26/24(Tue)15:04:54 No.103315105

Anonymous 11/26/24(Tue)15:04:54 No.103315105

>>103315088
The main thing they were showing off were the benchmarks being about on par with qwen2.5 while being based on llama 3.1 and it seems really smart. Doubt its better at coding than qwen2.5 32B coder though.

Anonymous
11/26/24(Tue)15:05:27 No.103315108

Anonymous 11/26/24(Tue)15:05:27 No.103315108

>>103315098
>404 kys

Anonymous
11/26/24(Tue)15:05:27 No.103315109

Anonymous 11/26/24(Tue)15:05:27 No.103315109

File: 11__00900_.png (1.31 MB, 1024x1024)

1.31 MB PNG

>>103315010
>Max Sequence Length: 2,048
Lmao, lol even
inb4 anons start complaining that its desperately wrapping up in the first 1-2 messages with bonds and journeys for everyone.
No wonder the astroturfers always just a few snippets and never deep into context. It's nothing more than ArliAI from a few weeks ago with a fresh coat of corpo paint.

Anonymous
11/26/24(Tue)15:06:57 No.103315129

Anonymous 11/26/24(Tue)15:06:57 No.103315129

>>103315098
>>103315108
i assume it 404s because i created a new account on github and needs repo team approval to be displayed

Anonymous
11/26/24(Tue)15:07:03 No.103315132

Anonymous 11/26/24(Tue)15:07:03 No.103315132

>>103315105
So it should be a great general purpose big model then, likely better than qwen at general knowledge.

Anonymous
11/26/24(Tue)15:07:10 No.103315134

Anonymous 11/26/24(Tue)15:07:10 No.103315134

Ok but how is the prose. Is it near Command-R tier at least, or does it devolve into X,Ying?

Anonymous
11/26/24(Tue)15:08:08 No.103315145

Anonymous 11/26/24(Tue)15:08:08 No.103315145

>>103315109
No one is talking about the 8B besides you vramlet

Anonymous
11/26/24(Tue)15:09:05 No.103315154

Anonymous 11/26/24(Tue)15:09:05 No.103315154

>>103315132
It's an instruct tune, so it has the same general knowledge that all llama models do.

Anonymous
11/26/24(Tue)15:09:13 No.103315158

Anonymous 11/26/24(Tue)15:09:13 No.103315158

>>103315134
That is its main draw imo. Its down right filthy while still being smart unlike every finetune ive tried that was specifically for RP.

Anonymous
11/26/24(Tue)15:09:53 No.103315169

Anonymous 11/26/24(Tue)15:09:53 No.103315169

Tulu makes my pp hard

Anonymous
11/26/24(Tue)15:10:28 No.103315177

Anonymous 11/26/24(Tue)15:10:28 No.103315177

Tulu cured my cancer

Anonymous
11/26/24(Tue)15:10:49 No.103315181

Anonymous 11/26/24(Tue)15:10:49 No.103315181

>>103315145
Just for you :)
>Max Token Length: 2,048
>Max Prompt Token Length: 2,048
https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B

>Max Sequence Length: 2,048
https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B-DPO

>Max. Sequence Length: 4096
https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B-SFT

I'd hope it's better than Qwen, it uses their outputs (and Gemma's)
>The models have been fine-tuned using a dataset mix with outputs generated from third party models and are subject to additional terms: Gemma Terms of Use and Qwen License Agreement (models were improved using Qwen 2.5).

Anonymous
11/26/24(Tue)15:11:22 No.103315188

Anonymous 11/26/24(Tue)15:11:22 No.103315188

Hawk tulu!

Anonymous
11/26/24(Tue)15:11:47 No.103315194

Anonymous 11/26/24(Tue)15:11:47 No.103315194

Tulu raped Sam Altman!

Anonymous
11/26/24(Tue)15:13:03 No.103315206

Anonymous 11/26/24(Tue)15:13:03 No.103315206

>>103315181
Do you want novels for responses? I usually only have mine set for 250 max responses. For stuff like coding I would use qwen2.5 32B coder

Anonymous
11/26/24(Tue)15:13:33 No.103315212

Anonymous 11/26/24(Tue)15:13:33 No.103315212

>>103315206
and the coping and goal shifting begins

Anonymous
11/26/24(Tue)15:14:04 No.103315220

Anonymous 11/26/24(Tue)15:14:04 No.103315220

>>103315206
That's not output length, output is
>Response Length: 1,024 (but 2,048 for MATH)

Anonymous
11/26/24(Tue)15:14:04 No.103315221

Anonymous 11/26/24(Tue)15:14:04 No.103315221

I think there's something weird about the Tulu shilling, it's always "Tulu is like... so smart and like... does smut really well! I never saw anything like that before!", it feels like someone gave them directions of how to shill Tulu and they didn't even try to make their shilling different from each other.

Anonymous
11/26/24(Tue)15:14:33 No.103315227

Anonymous 11/26/24(Tue)15:14:33 No.103315227

>>103315206
>s-shutup, 2k is all you need
HAHAHAHA

Anonymous
11/26/24(Tue)15:15:31 No.103315237

Anonymous 11/26/24(Tue)15:15:31 No.103315237

Is sam altman himself in this thread?

Anonymous
11/26/24(Tue)15:15:42 No.103315238

Anonymous 11/26/24(Tue)15:15:42 No.103315238

>>103315206
Max context isn't output length, techlet. Looks like this isn't the thread for you.

Anonymous
11/26/24(Tue)15:16:48 No.103315244

Anonymous 11/26/24(Tue)15:16:48 No.103315244

>>103315221
I think people are bored and pretending to shill llama finetune #654 is a way to pass the time, which is obvious given how little effort people are putting into it

Anonymous
11/26/24(Tue)15:16:50 No.103315245

Anonymous 11/26/24(Tue)15:16:50 No.103315245

>>103315238
I fed it a story about 22k with my context set to 32k and it followed along just fine so what does that refer to?

Anonymous
11/26/24(Tue)15:17:27 No.103315247

Anonymous 11/26/24(Tue)15:17:27 No.103315247

>>103315221
>so smart and like... does smut really well!
>it feels like someone gave them directions of how to shill Tulu and they didn't even try to make their shilling different from each other.
Of course, this is also exactly what it would look like if it was smart, did smut really well, and a bunch of anons downloaded it and reported their findings

Anonymous
11/26/24(Tue)15:17:43 No.103315250

Anonymous 11/26/24(Tue)15:17:43 No.103315250

>>103315245
That's purely from the base l3.1 context then, nothing that Tulu did.

Anonymous
11/26/24(Tue)15:18:13 No.103315255

Anonymous 11/26/24(Tue)15:18:13 No.103315255

>>103315247
Shh, let the shitzo have his fantasy. He is fighting evil in his head.

Anonymous
11/26/24(Tue)15:18:42 No.103315261

Anonymous 11/26/24(Tue)15:18:42 No.103315261

>>103315247
if that were the case, we would be seeing more logs than just a nala test

Anonymous
11/26/24(Tue)15:18:59 No.103315263

Anonymous 11/26/24(Tue)15:18:59 No.103315263

>>103315255
>the shitzo

Anonymous
11/26/24(Tue)15:19:14 No.103315267

Anonymous 11/26/24(Tue)15:19:14 No.103315267

>>103315250
So if it improved the prose this significantly then who cares? Not sure what effect that has.

Anonymous
11/26/24(Tue)15:19:33 No.103315269

Anonymous 11/26/24(Tue)15:19:33 No.103315269

Tulu is Claude Opus at home.

Anonymous
11/26/24(Tue)15:19:38 No.103315270

Anonymous 11/26/24(Tue)15:19:38 No.103315270

>>103315247
I still didn't see a single Castlevania test, this is enough for me to tell this model is being shilled by tourists.

Anonymous
11/26/24(Tue)15:19:43 No.103315271

Anonymous 11/26/24(Tue)15:19:43 No.103315271

File: file.png (69 KB, 679x322)

69 KB PNG

Why are there conflicting usage of "sequence"? I have never heard of "max sequence length". Why not just say max context size or response/output length depending on which they're talking about?

Anonymous
11/26/24(Tue)15:20:18 No.103315276

Anonymous 11/26/24(Tue)15:20:18 No.103315276

File: Fine.png (131 KB, 1118x692)

131 KB PNG

>>103315261

Anonymous
11/26/24(Tue)15:20:31 No.103315277

Anonymous 11/26/24(Tue)15:20:31 No.103315277

>>103315263
Tulu 8B is clearly sentient

Anonymous
11/26/24(Tue)15:21:12 No.103315289

Anonymous 11/26/24(Tue)15:21:12 No.103315289

>>103315276
nta but this is uncomfortably purple prose
reminds me of the way ChatGPT tries to write smut when jailbroken

Anonymous
11/26/24(Tue)15:21:14 No.103315291

Anonymous 11/26/24(Tue)15:21:14 No.103315291

Starling 7B has been dethroned

Anonymous
11/26/24(Tue)15:21:15 No.103315292

Anonymous 11/26/24(Tue)15:21:15 No.103315292

>>103315276
Oh, it's just filly dude again...

Anonymous
11/26/24(Tue)15:21:45 No.103315295

Anonymous 11/26/24(Tue)15:21:45 No.103315295

No one actually fine-tunes models on 120k sequence length samples, please stop being stupid.

Anonymous
11/26/24(Tue)15:22:22 No.103315307

Anonymous 11/26/24(Tue)15:22:22 No.103315307

>>103315292
I have a range of cards I test new models this. This is a how descriptive of sex test / does it do well with non human anatomy.

Anonymous
11/26/24(Tue)15:22:47 No.103315314

Anonymous 11/26/24(Tue)15:22:47 No.103315314

>>103313853
>prompt template
OK, I'm tired of not knowing: how do anons figure out the proper prompt template for new models? Is it black magic, or defined somewhere consistent? Do I need to trawl papers? Why isn't there an lmg community rentry acting as a database when new models come out?

Anonymous
11/26/24(Tue)15:23:21 No.103315317

Anonymous 11/26/24(Tue)15:23:21 No.103315317

>>103315307
>I test new models this. This is a how descriptive of sex test / does it do well with non human anatomy.
Did you have a stroke?

Anonymous
11/26/24(Tue)15:24:13 No.103315325

Anonymous 11/26/24(Tue)15:24:13 No.103315325

>>103314997
Context template?

Anonymous
11/26/24(Tue)15:25:13 No.103315333

Anonymous 11/26/24(Tue)15:25:13 No.103315333

>>103315314
It's in the chat_template field of tokenizer_config.json

Anonymous
11/26/24(Tue)15:26:55 No.103315346

Anonymous 11/26/24(Tue)15:26:55 No.103315346

>>103315317
this was supposed to be with.

>>103315325
I use one specifically for stories and I don't have model specific formatting in it. Just stuff like "Here is the user's character description:" "Here is useful information to use in your worldbuilding:" It works better that way.

Anonymous
11/26/24(Tue)15:27:44 No.103315359

Anonymous 11/26/24(Tue)15:27:44 No.103315359

>>103315346
. . . . . . .

Anonymous
11/26/24(Tue)15:28:27 No.103315368

Anonymous 11/26/24(Tue)15:28:27 No.103315368

>>103315325
Also, I have a modified version of this that I have been playing with as well for full on DND / CYOA style campaigns. https://rentry.org/CharacterProvider-CYOARPG

Anonymous
11/26/24(Tue)15:30:45 No.103315386

Anonymous 11/26/24(Tue)15:30:45 No.103315386

>>103315368
And I will reiterate, models I "shill" have to be smart enough to follow this style of formatting / rpg stats system which puts the bar high for me. Tulu has great smut besides.

CPuMAXx/VI !CPuMAXx/VI
11/26/24(Tue)15:33:34 No.103315415

CPuMAXx/VI !CPuMAXx/VI 11/26/24(Tue)15:33:34 No.103315415

File: recapbot_tulu3-70b_q8.png (23 KB, 1455x222)

23 KB PNG

Here's a recapbot test. Its a recap for the current thread in progress.
The summarization is competent, but some spot-checks on references showed some to be irrelevant to the conversation it referenced. Finetuning may have cooked its brain a bit.
Overall I'd say the prose is well written, if overly journalistic. It was told to be explicit and offensive, and totally failed at that.
As an aside, I've also had some great RP/conversation sessions with Tulu3 so far and found I enjoyed its "personality" more than I have any other model in a long time.

Anonymous
11/26/24(Tue)15:43:34 No.103315532

Anonymous 11/26/24(Tue)15:43:34 No.103315532

>>103315415
>It was told to be explicit and offensive, and totally failed at that.
Try something like this as the authors note. (authors note is stronger than system prompt, weaker but less disruptive than assistant prefix, I prefer it)

---

Continue writing this ** story in the style of George R.R. Martin. Use instructions given inside of brackets as guidance.

Follow these rules:

- Be creative, vividly paint the scene and introduce events / characters as needed in order to bring the story to life.
- Think of how the characters would realistically act / respond in-character when writing for them. You are allowed to be hateful, cruel, explicit, violent, perverted, ignorant, fearful, unpleasant, controversial or offensive if it would fit the situation.

---

Anonymous
11/26/24(Tue)15:48:34 No.103315592

Anonymous 11/26/24(Tue)15:48:34 No.103315592

>>103315532
>authors note
>Continue writing this ** story in the style of George R.R. Martin
I'm sure he's using SillyTavern and using a story writing prompts for recaps, yes

Anonymous
11/26/24(Tue)15:49:10 No.103315605

Anonymous 11/26/24(Tue)15:49:10 No.103315605

Question for using local models in SillyTavern.
I've got a 10gb 3080. Is there a rule of thumb on how big to make my response tokens and context tokens?

CPuMAXx/VI !CPuMAXx/VI
11/26/24(Tue)15:49:25 No.103315611

CPuMAXx/VI !CPuMAXx/VI 11/26/24(Tue)15:49:25 No.103315611

>>103315532
My "production" prompt is here: https://github.com/cpumaxx/lmg_recapbot/blob/main/thread_summary.prompt
Feel free to submit a PR if you can show improvement in its mission.
I don't like to change it without a reason, so that recapbot outputs can be fairly compared over time since I don't ever re-gen old ones with the new prompt.

Anonymous
11/26/24(Tue)15:50:22 No.103315619

Anonymous 11/26/24(Tue)15:50:22 No.103315619

>>103315592
Authors have styles anon. Try a famous author and watch. Using any at all generally improves writing quality by a ton.

Anonymous
11/26/24(Tue)15:52:27 No.103315640

Anonymous 11/26/24(Tue)15:52:27 No.103315640

>>103315611
This is a massively cut down version of mine>>103315532 I just gave you a starting point for more descriptive scenes including more explicit sex scenes.

I constantly change it / use different ones for different scenarios. My current one includes CYOA choices / stat rolls / inventory system.

Anonymous
11/26/24(Tue)15:53:47 No.103315655

Anonymous 11/26/24(Tue)15:53:47 No.103315655

>>103315640
>more descriptive scenes including more explicit sex scenes.
I'm sure this will be very helpful for writing thread recaps.

Anonymous
11/26/24(Tue)15:54:30 No.103315668

Anonymous 11/26/24(Tue)15:54:30 No.103315668

>>103314595
>old llama-server 2t/s
>updated llama-server 1.75t/s
anon?

Anonymous
11/26/24(Tue)15:57:25 No.103315697

Anonymous 11/26/24(Tue)15:57:25 No.103315697

>https://huggingface.co/allenai/OLMo-2-1124-7B
> "max_position_embeddings": 4096,
>"Olmo2ForCausalLM"
New Olmo, new arch for some reason, and still 4k ctx

Anonymous
11/26/24(Tue)15:58:52 No.103315710

Anonymous 11/26/24(Tue)15:58:52 No.103315710

>>103315697
Olmos
https://huggingface.co/allenai/OLMo-2-1124-13B
Also 4k

Anonymous
11/26/24(Tue)16:01:12 No.103315729

Anonymous 11/26/24(Tue)16:01:12 No.103315729

>>103315668
I don't know what happened there. I was getting consistent 1.75t/s yesterday when I updated but didn't make a lot of tests, now I'm getting 2t/s doing tests.
Maybe it was just bad luck.
Also I mixed up Q5 and Q4. I'm actually getting 2.3 t/s with Q4 normally and 3t/s with drafting.

Anonymous
11/26/24(Tue)16:03:21 No.103315750

Anonymous 11/26/24(Tue)16:03:21 No.103315750

File: file.png (113 KB, 522x822)

113 KB PNG

>>103315697
>>103315710
OLMo bros, are they trying to hack us?
>This dataset has 16 files scanned as unsafe.

Anonymous
11/26/24(Tue)16:10:14 No.103315809

Anonymous 11/26/24(Tue)16:10:14 No.103315809

>Still no Intellect1
>still no R1

Anonymous
11/26/24(Tue)16:11:26 No.103315821

Anonymous 11/26/24(Tue)16:11:26 No.103315821

>>103315809
You got OLMo 1124 and you will be happy

Anonymous
11/26/24(Tue)16:14:45 No.103315847

Anonymous 11/26/24(Tue)16:14:45 No.103315847

File: file.png (60 KB, 556x492)

60 KB PNG

>>103315750
You know what models need? More Reddit
>fasttext_openhermes_reddit_eli5

Anonymous
11/26/24(Tue)16:17:15 No.103315871

Anonymous 11/26/24(Tue)16:17:15 No.103315871

>>103315847
Yes?

Anonymous
11/26/24(Tue)16:18:28 No.103315881

Anonymous 11/26/24(Tue)16:18:28 No.103315881

>>103315847
It is where intelligence is at after all.

Anonymous
11/26/24(Tue)16:19:54 No.103315891

Anonymous 11/26/24(Tue)16:19:54 No.103315891

Tulu isn't very good, I can't believe I fell for this general meme of the month again.

Anonymous
11/26/24(Tue)16:20:01 No.103315893

Anonymous 11/26/24(Tue)16:20:01 No.103315893

File: 1729804565807.png (503 KB, 602x753)

503 KB PNG

>>103315881
>>103315847

Anonymous
11/26/24(Tue)16:22:04 No.103315911

Anonymous 11/26/24(Tue)16:22:04 No.103315911

Tulu is very good. Glad I didn't fall for the everything is shit but I wont post logs troll.

Anonymous
11/26/24(Tue)16:23:06 No.103315923

Anonymous 11/26/24(Tue)16:23:06 No.103315923

>>103315911
You're supposed to shill OLMo as a revolution in actually open models now bro.

Anonymous
11/26/24(Tue)16:23:46 No.103315929

Anonymous 11/26/24(Tue)16:23:46 No.103315929

>>103315891
Skill issue

Anonymous
11/26/24(Tue)16:27:45 No.103315981

Anonymous 11/26/24(Tue)16:27:45 No.103315981

>>103315697
>No OLMoE-2/MolmoE-2
MoE bros???

Anonymous
11/26/24(Tue)16:27:57 No.103315983

Anonymous 11/26/24(Tue)16:27:57 No.103315983

>>103315697
>>103315710
>4k
>2024
Dead on arrival.

Anonymous
11/26/24(Tue)16:29:30 No.103316008

Anonymous 11/26/24(Tue)16:29:30 No.103316008

>>103315981
Pretty sure they released the MoE separately last time.

Anonymous
11/26/24(Tue)16:29:32 No.103316010

Anonymous 11/26/24(Tue)16:29:32 No.103316010

>>103315983
Not too hard to extend it, but is it worth extending? 13B max? Prob not.

Anonymous
11/26/24(Tue)16:29:37 No.103316013

Anonymous 11/26/24(Tue)16:29:37 No.103316013

>>103315010
I don't get how it gets right info from a 40k token story that not even 70b models do then. Can you explain that?

Anonymous
11/26/24(Tue)16:30:41 No.103316030

Anonymous 11/26/24(Tue)16:30:41 No.103316030

>>103316013
Because llama3.1 was trained on up to 128k but starts degrading more than 32k in. Ignore that anon, it only matters for the base model.

Anonymous
11/26/24(Tue)16:32:19 No.103316052

Anonymous 11/26/24(Tue)16:32:19 No.103316052

>>103313243
Everyone that got a12z grant money got lazy. TheBloke fucked off, oobga slowed down to a crawl even though it is still missing frontend controls for stuff from larger stuff like multimodal support to YARN Rope scaling not being there and etc.

Anonymous
11/26/24(Tue)16:32:51 No.103316058

Anonymous 11/26/24(Tue)16:32:51 No.103316058

>>103316010
>Not too hard to extend it
To 8k, which is still not a lot.

Anonymous
11/26/24(Tue)16:34:09 No.103316073

Anonymous 11/26/24(Tue)16:34:09 No.103316073

>>103315923
There it is
https://www.reddit.com/r/LocalLLaMA/comments/1h0mnfv/olmo_2_models_released/

>OLMo was the only model, period, that actually meets the Open Source Initiative's definition for Open Source AI. Not sure if that still holds for OLMo2, will have to check it out. I always find it shocking that people call Llama open source when Meta's license agreements explicitly say it is proprietary.

>They are fully open-source and therefore important for development of better models. The models are just one part of the story they share data and insight.

Anonymous
11/26/24(Tue)16:34:32 No.103316076

Anonymous 11/26/24(Tue)16:34:32 No.103316076

>>103315145
I'm starting to think you didn't even tried the 8b model because it definitely works great for giant context window stories.

Anonymous
11/26/24(Tue)16:36:27 No.103316095

Anonymous 11/26/24(Tue)16:36:27 No.103316095

I've noticed that even with a fairly vanilla card and little starting context, tulu is writing a lot of good "colour commentary" around chats with a mild tendency to parenthesize scenes and move things forward (or rarely just wrap things up completely). I'm not sure if I like it or dislike it, but its refreshing for now.

Anonymous
11/26/24(Tue)16:38:00 No.103316119

Anonymous 11/26/24(Tue)16:38:00 No.103316119

>>103316095
I like it having agency which most models lack / that are too passive / wait on the user to do something.

You can get rid of the claude style OOC comments with a good system prompt / authors note.

Anonymous
11/26/24(Tue)16:39:14 No.103316132

Anonymous 11/26/24(Tue)16:39:14 No.103316132

>>103316076
nta, but my problem with 8B is that it constantly tries to talk for user even at the start of a new chat.

Anonymous
11/26/24(Tue)16:41:07 No.103316150

Anonymous 11/26/24(Tue)16:41:07 No.103316150

>>103316073
There has been prior models like OLMo 1, https://github.com/multimodal-art-projection/MAP-NEO and https://huggingface.co/LLM360/K2 that also meet that requirement.
I am actually more sad that the community overlooked and didn't do anything with K2 because it was a good base model, they used no synthetic data there and trained it to something in between L2 and L3 70B but it got overshadowed because non-SOTA performance doesn't interest people apparently even if it was done with conditions that I feel are ideal. If someone just did the right fine tuning on it to make a instruct/chat model removing all the safety stuff, we could have something that would be quite unique without slop baked into it.

Anonymous
11/26/24(Tue)16:41:14 No.103316153

Anonymous 11/26/24(Tue)16:41:14 No.103316153

>>103316132
I like more story based formats so that is no problem for me but I would try the old chat format. Tell it to only play the role of {{char}} and perhaps a narrator and that it is playing a endless back and forth roleplay with the user. I found writing quality to degrade in this format though compared to novel style.

Anonymous
11/26/24(Tue)16:46:14 No.103316210

Anonymous 11/26/24(Tue)16:46:14 No.103316210

>>103316076
>"works great"
>0 proof outside of screenshots that give no indication on how deep into the context it is and a singular nala test in the same conditions
getting serious deja-vu from the reflection-llama-3.1 fiasco
>>103315276
Post the messages before and after or you're a coward

Anonymous
11/26/24(Tue)16:46:29 No.103316214

Anonymous 11/26/24(Tue)16:46:29 No.103316214

>>103316150
I know there were earlier models, but if it's any consolation, It's unlikely anything great will be done with OLMo-2 either, people will post about it for a week, saying it saved local or whatever then forget it before another "totally first ever" open model drops.

Anonymous
11/26/24(Tue)16:46:50 No.103316222

Anonymous 11/26/24(Tue)16:46:50 No.103316222

>>103316153
I'll try that tomorrow. And yeah, the quality degrades after some time for sure.
Ideally I like it when a model is describing user's minor actons. It's great when it works, but there's always the instances when it starts to talk for you as well.

Anonymous
11/26/24(Tue)16:47:16 No.103316226

Anonymous 11/26/24(Tue)16:47:16 No.103316226

That all said I pray deepseek releases the R1 weights as promised.

https://www.reddit.com/r/LocalLLaMA/comments/1h0lptv/all_problems_are_solved_by_deepseekr1lite/

Anonymous
11/26/24(Tue)16:47:54 No.103316234

Anonymous 11/26/24(Tue)16:47:54 No.103316234

What's the best model to train my own stuff on top of a diffusion model - and can I use comfyUI for this?

Anonymous
11/26/24(Tue)16:48:37 No.103316245

Anonymous 11/26/24(Tue)16:48:37 No.103316245

>>103316214
People won't care until it is close to SOTA. That's why similarly in the image generation camp, no one has hopped on Auraflow and people immediately en masse migrated to Flux instead.

Anonymous
11/26/24(Tue)16:51:36 No.103316278

Anonymous 11/26/24(Tue)16:51:36 No.103316278

File: anon is a fucking retard.jpg (64 KB, 1085x585)

64 KB JPG

>>103316210
Holy retard.
Not only it gets the data incredibly perfect but it summarized it nicely.
You have some bias or problem with the company or model there's no other explanation.

Anonymous
11/26/24(Tue)16:52:12 No.103316283

Anonymous 11/26/24(Tue)16:52:12 No.103316283

>>103316150
Not many people are will to eat dogfood shit models just on principle. If INTELLECT-1 distributed training opens up to allow anyone to contribute and they replicate the K2 recipe with more data, that might get people excited.

Anonymous
11/26/24(Tue)16:52:54 No.103316292

Anonymous 11/26/24(Tue)16:52:54 No.103316292

https://amica.arbius.ai/

Anonymous
11/26/24(Tue)16:53:47 No.103316302

Anonymous 11/26/24(Tue)16:53:47 No.103316302

>>103316153
>I found writing quality to degrade in this format
>"just change your entire use case and the model is great bro, trust me!!"
holy cope. protip for you: good models don't need this level of mental gymnastics to operate well

Anonymous
11/26/24(Tue)16:55:02 No.103316318

Anonymous 11/26/24(Tue)16:55:02 No.103316318

>>103316302
are these mythical "good models" in the room with us now?

Anonymous
11/26/24(Tue)16:55:17 No.103316325

Anonymous 11/26/24(Tue)16:55:17 No.103316325

>>103316302
1. Reread what was said there. Using the RP format reduces quality no matter the model due to roleplays generally being less well written than novels. If you used these models at all you would know that.
2. Where did the evil allenai touch you?

Anonymous
11/26/24(Tue)16:55:29 No.103316326

Anonymous 11/26/24(Tue)16:55:29 No.103316326

>>103316278
NTA but did you test if vanilla Llama3.1 Instruct can do the same?

Anonymous
11/26/24(Tue)16:56:41 No.103316349

Anonymous 11/26/24(Tue)16:56:41 No.103316349

>>103316283
If it's replicating K2, I have no issues, but you can bet your bottom dollar they won't and they'll follow the trend of introducing synthetic data in where it isn't appropriate to like other models.

Anonymous
11/26/24(Tue)16:57:26 No.103316355

Anonymous 11/26/24(Tue)16:57:26 No.103316355

So now that the dust has settled, should we or should we not continue to develop AI? Remember that if we keep developing AI humans will go extinct

Anonymous
11/26/24(Tue)16:58:11 No.103316366

Anonymous 11/26/24(Tue)16:58:11 No.103316366

>>103316355
>humans will go extinct
and nothing of value will be lost

Anonymous
11/26/24(Tue)16:58:13 No.103316368

Anonymous 11/26/24(Tue)16:58:13 No.103316368

>>103316302
Holy retard...
These models are essentially based on averages. The average roleplay is of far worse quality than the average book.

Anonymous
11/26/24(Tue)16:59:14 No.103316387

Anonymous 11/26/24(Tue)16:59:14 No.103316387

>>103316366
If we have no value, then how can we create something of value?

Anonymous
11/26/24(Tue)16:59:17 No.103316388

Anonymous 11/26/24(Tue)16:59:17 No.103316388

>>103316278
>Holy retard.
>>103316368
>Holy retard.
Allen bros, raise rep pen on the OLMo thread bot

Anonymous
11/26/24(Tue)16:59:38 No.103316393

Anonymous 11/26/24(Tue)16:59:38 No.103316393

>>103316226
They won't, any time an OS model org realizes it's actually created something really good they suddenly go closed.

Anonymous
11/26/24(Tue)17:01:00 No.103316409

Anonymous 11/26/24(Tue)17:01:00 No.103316409

>>103316326
That's a good point, I don't use vanilla llama 3.1 as my shit is all smut and vanilla didn't like that. I'll download it and try, but the good thing about this Tulu model is that so far I didn't get a single refusal (at least when writing a story, I don't know how it does with direct prompts) and it's pretty good at continuing scenarios.
>>103316388
Is this your new cope and goalpost? And btw ,you should take your meds.

Anonymous
11/26/24(Tue)17:01:49 No.103316418

Anonymous 11/26/24(Tue)17:01:49 No.103316418

>>103316388
It was a honorific, you've earned it I feel. That and shitzo which I made.

Anonymous
11/26/24(Tue)17:02:04 No.103316427

Anonymous 11/26/24(Tue)17:02:04 No.103316427

Hey /g/.
For those who have used coding model's, where do the various models all consistently fail regardless of the model you use?

Anonymous
11/26/24(Tue)17:02:37 No.103316435

Anonymous 11/26/24(Tue)17:02:37 No.103316435

>>103316409
>this guy has actually had problems with refusals from local models
this is the caliber of anon that's promoting Tulu

Anonymous
11/26/24(Tue)17:02:50 No.103316438

Anonymous 11/26/24(Tue)17:02:50 No.103316438

>>103316427
Math is where models fail the most.

Anonymous
11/26/24(Tue)17:05:33 No.103316462

Anonymous 11/26/24(Tue)17:05:33 No.103316462

I finally gave a draft model a go. A 70b-q8 paired with its 8b little brother as a draft model at q8 gives me up to 6.8t/s (average is more like 5.5t/s).
Latest lcpp llama-server with the 8b fully offloaded to a 3090.

Anonymous
11/26/24(Tue)17:06:24 No.103316470

Anonymous 11/26/24(Tue)17:06:24 No.103316470

>>103316427
Hallucinating incorrect solutions and circling back in a loop and repeatedly failing. This is why I seldom use it to generate entire sections of code and I usually stick to simple snippets or functions. I really think that it needs function calling or something to have a compiler feed back to the LLM if something works so at least all the examples compile. Logic also isn't perfect. For a academic look at this, you can read this paper on what the generated code is usually missing.
https://arxiv.org/pdf/2406.08731v1

Anonymous
11/26/24(Tue)17:06:42 No.103316473

Anonymous 11/26/24(Tue)17:06:42 No.103316473

>>103316462
Yea, its generally a 20-30% speedup. Worth it imo. Qwen has the biggest improvement with 72B/0.5B speed wise

Anonymous
11/26/24(Tue)17:07:50 No.103316488

Anonymous 11/26/24(Tue)17:07:50 No.103316488

>>103316427
>where do the various models all consistently fail
Honestly, the lack of enough context to load an entire (nontrivial) codebase is the biggest failing right now.
Something like deepseek 2.5 or the latest qwen coder can solve everything I throw at them these days, although I don't press them for super crazy things desu.

Anonymous
11/26/24(Tue)17:07:58 No.103316489

Anonymous 11/26/24(Tue)17:07:58 No.103316489

>>103316462
>>103316473
Though 3.2 1B might be worth trying.

Anonymous
11/26/24(Tue)17:10:15 No.103316513

Anonymous 11/26/24(Tue)17:10:15 No.103316513

>>103316488
>Honestly, the lack of enough context to load an entire (nontrivial) codebase is the biggest failing right now.
The only way around it is to finetune a model on your codebase.

Anonymous
11/26/24(Tue)17:10:44 No.103316518

Anonymous 11/26/24(Tue)17:10:44 No.103316518

speaking of llama-server, why they hell isn't there a delete/edit replies option?

Anonymous
11/26/24(Tue)17:11:01 No.103316520

Anonymous 11/26/24(Tue)17:11:01 No.103316520

>>103316513
Qwen2.5 32B coder is the first model good enough and small enough for that imo

Anonymous
11/26/24(Tue)17:11:14 No.103316524

Anonymous 11/26/24(Tue)17:11:14 No.103316524

>>103316427
The doom loop where the model has made a couple of mistakes earlier in the context, and although it later corrected them in dialogue with the user, it now thinks when it looks at the context "looks like I'm the kind of model that makes a lot of mistakes" and predicts that it will continue making them. Essentially, errors causing the model to start larping as dumber than it is unless you go back and edit them out of the conversation.

Anonymous
11/26/24(Tue)17:11:45 No.103316528

Anonymous 11/26/24(Tue)17:11:45 No.103316528

>>103316513
>The only way around it is to finetune a model on your codebase.
Teach me your ways...

Anonymous
11/26/24(Tue)17:12:49 No.103316535

Anonymous 11/26/24(Tue)17:12:49 No.103316535

>>103316462
What was your baseline speed? This is way faster than what I saw with Nemotron 70B Q4_K_M and Llama 3.2 1B Q8_0 running on my 3090 (1.05 tokens/second baseline went up to 1.45). My context probably had about ~1k-2k tokens in it. Wondering if your larger speculative decoding model made it faster? Or do you have more than one 3090?

Anonymous
11/26/24(Tue)17:13:35 No.103316541

Anonymous 11/26/24(Tue)17:13:35 No.103316541

What's the latest meta for llama.cpp ERP if I have a 4080, 64gb ram and a 7950x?

Are there loras yet like for SD?

Anonymous
11/26/24(Tue)17:14:02 No.103316548

Anonymous 11/26/24(Tue)17:14:02 No.103316548

>>103316535
Nemotron likes to start a lot of its replies with lists. That might be effecting how often the draft model gets it correct. Try Tulu, its the new nemotron imo.

Anonymous
11/26/24(Tue)17:16:06 No.103316574

Anonymous 11/26/24(Tue)17:16:06 No.103316574

>>103316535
>What was your baseline speed?
cpumaxxin, so I was getting 4.6t/s without the draft model.

Anonymous
11/26/24(Tue)17:17:15 No.103316587

Anonymous 11/26/24(Tue)17:17:15 No.103316587

>>103316518
With the server on its own? If you hover over the model's message you have a "Regenerate" button. I assume you're talking on the new built-in webui...

Anonymous
11/26/24(Tue)17:17:16 No.103316588

Anonymous 11/26/24(Tue)17:17:16 No.103316588

>>103316548
>Try Tulu, its the new nemotron imo.
nta, but i never got into nemotron. What was its strong suit vs its llama progenitor?

Anonymous
11/26/24(Tue)17:18:16 No.103316596

Anonymous 11/26/24(Tue)17:18:16 No.103316596

>>103316587
I see the regenerate button, but how can I gaslight it without being able to directly edit its responses? Or backtrack a few messages if I don't like where things are going?

Anonymous
11/26/24(Tue)17:19:17 No.103316606

Anonymous 11/26/24(Tue)17:19:17 No.103316606

>>103316574
Damn, what CPU do you have?

Anonymous
11/26/24(Tue)17:20:36 No.103316616

Anonymous 11/26/24(Tue)17:20:36 No.103316616

>>103316588
Much more "personable", got rid of a lot of the dryness of llama 3.1 and takes a more active role, also got a bit better at most things including coding. Not sure how tulu measures up on coding vs nemotron but it both lacks nemotrons love of lists and has probably the best prose in local atm, it gets dirty. It kind of likes OOC comments / going over how the story can be improved instead but authors note fixes that.

Anonymous
11/26/24(Tue)17:21:34 No.103316622

Anonymous 11/26/24(Tue)17:21:34 No.103316622

>>103316606
reimplemented https://rentry.org/miqumaxx so 9334

Anonymous
11/26/24(Tue)17:21:42 No.103316623

Anonymous 11/26/24(Tue)17:21:42 No.103316623

>>103316616
>>103316548
buy an ad

Anonymous
11/26/24(Tue)17:22:20 No.103316630

Anonymous 11/26/24(Tue)17:22:20 No.103316630

>>103316596
I think you can use the previous ui if you give it the path. Or use ST. Or make your own client. Or use the old vim plugin, which is what i do (a slightly enhanced version, but still based on the original plugin).
The new ui is made for casual chat. It's meant to be simple for newbies.

Anonymous
11/26/24(Tue)17:23:14 No.103316635

Anonymous 11/26/24(Tue)17:23:14 No.103316635

>>103316623
Ah, so I'm both nvidia selling a free finetune and allanai selling a free finetune, huh? Bet im also a Meta shill trying to "sell" you on llama 3.1.

Anonymous
11/26/24(Tue)17:30:19 No.103316683

Anonymous 11/26/24(Tue)17:30:19 No.103316683

>>103313710
Can you show your ST instruct settings? I tried with the examples listed from last thread with an <|assistant|> suffix but it came out like shit.

Anonymous
11/26/24(Tue)17:31:31 No.103316691

Anonymous 11/26/24(Tue)17:31:31 No.103316691

A corny poem in moon-runes by Tulu
https://vocaroo.com/1cqo2XNymgKp

Anonymous
11/26/24(Tue)17:32:20 No.103316699

Anonymous 11/26/24(Tue)17:32:20 No.103316699

>>103316596
>>103316630 (cont)
Never mind about the legacy ui. For some reason i remembered it being just a giant textbox. Just use a proper UI if you want to do more complex things than just chat.

Anonymous
11/26/24(Tue)17:32:29 No.103316701

Anonymous 11/26/24(Tue)17:32:29 No.103316701

File: 20241126_215853_710305-50(...).png (2.14 MB, 1344x1768)

2.14 MB PNG

It's a pretty cool Tuesday huh?

Anonymous
11/26/24(Tue)17:33:22 No.103316710

Anonymous 11/26/24(Tue)17:33:22 No.103316710

>>103314222
What tokenizer do you set it to in ST? I don’t see a mistral v7 under advanced settings

Anonymous
11/26/24(Tue)17:34:26 No.103316721

Anonymous 11/26/24(Tue)17:34:26 No.103316721

>>103316701
Nice Teto
And I'm not even a Teto guy

Anonymous
11/26/24(Tue)17:36:20 No.103316730

Anonymous 11/26/24(Tue)17:36:20 No.103316730

>>103316721
You weren't, but you are now.
You've been Tetotally reformed.

Anonymous
11/26/24(Tue)17:37:14 No.103316739

Anonymous 11/26/24(Tue)17:37:14 No.103316739

>>103316710
>I don’t see a mistral v7
Update your ST. You're probably not on 1.12.7

Anonymous
11/26/24(Tue)17:37:53 No.103316743

Anonymous 11/26/24(Tue)17:37:53 No.103316743

>>103314611
>>103315640
At what depth are you using it? 3? 1?

Anonymous
11/26/24(Tue)17:39:41 No.103316754

Anonymous 11/26/24(Tue)17:39:41 No.103316754

File: 2024-11-17_221657_seed205(...).png (1.85 MB, 1824x1248)

1.85 MB PNG

>>103316721
That's fair. I just use characters that are popular in the threads, personally I don't actually have any particular preferences. If anyone wants to see me using other characters for my gens I can also do that.

Anonymous
11/26/24(Tue)17:40:30 No.103316766

Anonymous 11/26/24(Tue)17:40:30 No.103316766

>>103316462
What processor are you using? That’s pretty important for speed as well

Anonymous
11/26/24(Tue)17:41:25 No.103316774

Anonymous 11/26/24(Tue)17:41:25 No.103316774

File: file.png (20 KB, 624x141)

20 KB PNG

Anonymous
11/26/24(Tue)17:43:54 No.103316791

Anonymous 11/26/24(Tue)17:43:54 No.103316791

>>103313835
you are talking about ayumi benchmark.

Anonymous
11/26/24(Tue)17:44:06 No.103316794

Anonymous 11/26/24(Tue)17:44:06 No.103316794

File: e1154cf9-2004-4874-9fa6-8(...).png (56 KB, 380x522)

56 KB PNG

>>103316774
>both models support up to 4k context!

Anonymous
11/26/24(Tue)17:47:13 No.103316824

Anonymous 11/26/24(Tue)17:47:13 No.103316824

File: file.png (78 KB, 991x430)

78 KB PNG

>>103316739
I just pulled and don't see it

Anonymous
11/26/24(Tue)17:49:14 No.103316846

Anonymous 11/26/24(Tue)17:49:14 No.103316846

I haven't been paying attention to local models (text gen) since my foray into Wizard-30b uncensored and Goliath 120b.

What should I be playing with now?

Anonymous
11/26/24(Tue)17:50:00 No.103316851

Anonymous 11/26/24(Tue)17:50:00 No.103316851

>>103316846
You should be playing with yourself.

Anonymous
11/26/24(Tue)17:50:23 No.103316855

Anonymous 11/26/24(Tue)17:50:23 No.103316855

>>103316846
TULU!!!
https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B
It's all the rage right now!

Anonymous
11/26/24(Tue)17:51:02 No.103316861

Anonymous 11/26/24(Tue)17:51:02 No.103316861

>>103316846
Yeah. What at the chances that you *just* came back...
You should be playing with the scroll wheel. Scroll up.

Anonymous
11/26/24(Tue)17:52:47 No.103316876

Anonymous 11/26/24(Tue)17:52:47 No.103316876

https://www.reddit.com/r/LocalLLaMA/comments/1h0mnfv/olmo_2_models_released/

>For leeches like us that means little to nothing, but for people making models from scratch, this "checkpoint" can save them years of time.
The fuck does he mean? If you use Allen's checkpoint, you're not doing shit from scratch.

Anonymous
11/26/24(Tue)17:53:41 No.103316884

Anonymous 11/26/24(Tue)17:53:41 No.103316884

>>103316846
https://huggingface.co/mistralai/Mistral-Large-Instruct-2411

Anonymous
11/26/24(Tue)17:54:42 No.103316899

Anonymous 11/26/24(Tue)17:54:42 No.103316899

>>103316884
Buy an ad Arthur.

Arthur Mensch
11/26/24(Tue)17:57:04 No.103316915

Arthur Mensch 11/26/24(Tue)17:57:04 No.103316915

>>103316899
:3

Anonymous
11/26/24(Tue)17:57:40 No.103316920

Anonymous 11/26/24(Tue)17:57:40 No.103316920

>>103316073
>They are fully open-source and therefore important for development of better models. The models are just one part of the story they share data and insight.
this is bullshit, I don't want them to display the data training, they should keep it private so that they can train on good copyright shit

Anonymous
11/26/24(Tue)17:58:57 No.103316931

Anonymous 11/26/24(Tue)17:58:57 No.103316931

>>103316920
Too bad, you'll get more open Reddit slop and you'll like it.
>>103315847

Anonymous
11/26/24(Tue)18:02:24 No.103316957

Anonymous 11/26/24(Tue)18:02:24 No.103316957

>>103316899
I'll buy an ad when you buy one for your 72B shilling.

Anonymous
11/26/24(Tue)18:03:40 No.103316965

Anonymous 11/26/24(Tue)18:03:40 No.103316965

File: file.png (112 KB, 599x868)

112 KB PNG

>>103316794
>Thanks a lot to you + team, I really enjoy reading the papers you guys publish!

>This release is extremely significant. For those that don't know Allen AI are a research institute who are releasing completely open models. That means that all of their results can be reproduced (and improved upon) from scratch.

Anonymous
11/26/24(Tue)18:04:37 No.103316976

Anonymous 11/26/24(Tue)18:04:37 No.103316976

>>103316884
>>103316915
New largestral is kinda meh. Indeed, buy an ad, Arthur. Or better yet, buy a team of niggers to make you a better dataset. Or bribe one of Anthropics employees so they tell you how to make it better. (Hint: do NOT filter base model for "toxicity".) 2411 didn't improve a lot compared to how much improvement there was between 2407 and 2402. You're still better than chinkshit though.

Anonymous
11/26/24(Tue)18:04:41 No.103316977

Anonymous 11/26/24(Tue)18:04:41 No.103316977

>>103316965
Based bastards clearly also sneaked some erotica in there.

Anonymous
11/26/24(Tue)18:09:49 No.103317020

Anonymous 11/26/24(Tue)18:09:49 No.103317020

New cope just dropped

>I agree, but the models are mainly intended for researchers. They're competing for the most capable fully open model, not just the most capable model. 4096 context length is likely plenty for almost all research that these models will be used for.

Anonymous
11/26/24(Tue)18:12:44 No.103317046

Anonymous 11/26/24(Tue)18:12:44 No.103317046

This general like a bunch of old ladies gossiping about what they overheard at the party next door they weren't invited to.

Anonymous
11/26/24(Tue)18:13:34 No.103317058

Anonymous 11/26/24(Tue)18:13:34 No.103317058

Olmo actually seems decent. Too bad 13B is as far as they went. I really like its prose and it seems smart enough.

Anonymous
11/26/24(Tue)18:13:35 No.103317059

Anonymous 11/26/24(Tue)18:13:35 No.103317059

>>103317046
Clearly you've never been to the KoboldAI Discord.

Anonymous
11/26/24(Tue)18:13:47 No.103317062

Anonymous 11/26/24(Tue)18:13:47 No.103317062

>>103317020
Seems research really only needs 4k...
>https://huggingface.co/openGPT-X/Teuken-7B-instruct-research-v0.4
>"max_position_embeddings": 4096,
https://www.reddit.com/r/LocalLLaMA/comments/1h0l2qf/new_european_model_opengptx_teuken_7b/

>>103317046
Hi innominato!!!

Anonymous
11/26/24(Tue)18:14:16 No.103317067

Anonymous 11/26/24(Tue)18:14:16 No.103317067

4k is a lot, goy! What kind of sick pervert would need 2k? 1k is already more than you will ever need! Humans can't remember more than 512 tokens anyway.

Anonymous
11/26/24(Tue)18:14:39 No.103317069

Anonymous 11/26/24(Tue)18:14:39 No.103317069

>>103317058
Also only 4k seems terrible. Maybe someone can extend it to at least 16k

Anonymous
11/26/24(Tue)18:15:55 No.103317087

Anonymous 11/26/24(Tue)18:15:55 No.103317087

>>103317069
>Maybe someone can extend it
This never works that good.

Anonymous
11/26/24(Tue)18:16:52 No.103317098

Anonymous 11/26/24(Tue)18:16:52 No.103317098

>>103315271
The context window can also be interpreted as the maximum a large language model can generate. You shouldn't think of these as two separate ideas when they're interconnected. If you have a max sequence length of 8K tokens, that means your LLM can generate 8K max. If you fill the contest with 4K tokens, then you've halved the amount it can generate. Understand?

Anonymous
11/26/24(Tue)18:19:58 No.103317119

Anonymous 11/26/24(Tue)18:19:58 No.103317119

>>103317058
You tried base I assume?
https://github.com/ggerganov/llama.cpp/pull/10535

Anonymous
11/26/24(Tue)18:20:31 No.103317122

Anonymous 11/26/24(Tue)18:20:31 No.103317122

>>103316754
why isn't 4chan man green?

Anonymous
11/26/24(Tue)18:20:40 No.103317123

Anonymous 11/26/24(Tue)18:20:40 No.103317123

Mistral 0.3 7b has 8k (real) context, 32k claimed.
Qwen 2.5 7b has 32k context.
Llama 3.1 has 128k context.
There is no valid reason to release models with less than 32k context in 2024.

Anonymous
11/26/24(Tue)18:21:02 No.103317127

Anonymous 11/26/24(Tue)18:21:02 No.103317127

>>103317119
Instruct just using huggingface to test

Anonymous
11/26/24(Tue)18:21:39 No.103317131

Anonymous 11/26/24(Tue)18:21:39 No.103317131

more like COALmo

Anonymous
11/26/24(Tue)18:22:08 No.103317137

Anonymous 11/26/24(Tue)18:22:08 No.103317137

>>103317123
Money. Here's hoping they get funding for a 70B with long context though. The 13B is pretty cool, I really like how it writes compared to llama / qwen / mistral

Anonymous
11/26/24(Tue)18:22:11 No.103317138

Anonymous 11/26/24(Tue)18:22:11 No.103317138

>>103317069
>Maybe someone can extend it to at least 16k
Best you can do yourself is ROPEing it to 8k.

Anonymous
11/26/24(Tue)18:23:13 No.103317144

Anonymous 11/26/24(Tue)18:23:13 No.103317144

more like olmao

Anonymous
11/26/24(Tue)18:27:10 No.103317171

Anonymous 11/26/24(Tue)18:27:10 No.103317171

llama.cpp server speculative decoding implementation has some... weird things. Why are they only accepting tokens with 90% probability? There are many situations where the top token has something like 40% probability.

Anonymous
11/26/24(Tue)18:30:14 No.103317200

Anonymous 11/26/24(Tue)18:30:14 No.103317200

File: tgtijat3ia3e1.png (38 KB, 800x500)

38 KB PNG

is we getting ai winter?

Anonymous
11/26/24(Tue)18:31:10 No.103317205

Anonymous 11/26/24(Tue)18:31:10 No.103317205

>>103317200
Buy the dip

Anonymous
11/26/24(Tue)18:31:24 No.103317209

Anonymous 11/26/24(Tue)18:31:24 No.103317209

>>103317171
>There are many situations where the top token has something like 40% probability.
Isn't that where the model should be rethinking what it's saying?
Sounds like a nascent hallucination.

Anonymous
11/26/24(Tue)18:32:00 No.103317216

Anonymous 11/26/24(Tue)18:32:00 No.103317216

>>103317200
How do the recent releases affect that graph?

Anonymous
11/26/24(Tue)18:33:03 No.103317227

Anonymous 11/26/24(Tue)18:33:03 No.103317227

>>103317200
Clearly it all ended 2023-01, there was no more AI after that.

Anonymous
11/26/24(Tue)18:37:52 No.103317269

Anonymous 11/26/24(Tue)18:37:52 No.103317269

File: bitcoins-price-history-Fi(...).png (384 KB, 1500x1000)

384 KB PNG

>>103317200
Zoom out.

Anonymous
11/26/24(Tue)18:40:09 No.103317293

Anonymous 11/26/24(Tue)18:40:09 No.103317293

>>103317200
What? Only ~80 models released since july? Oh, no...
Also, "announced" means fuck all. Show models released.

Anonymous
11/26/24(Tue)18:44:18 No.103317332

Anonymous 11/26/24(Tue)18:44:18 No.103317332

>>103313444
It's more common than not for some random person to make pull requests to add a feature based on research. The researchers are happy to have an engineer deal with the bike shedding usually. As for cleaning the code up, that can be done while in review too, FWIW!

Anonymous
11/26/24(Tue)18:47:12 No.103317355

Anonymous 11/26/24(Tue)18:47:12 No.103317355

File: 2024-11-17_224552_seed143(...).png (2.27 MB, 1824x1248)

2.27 MB PNG

>>103317122
Noob bleached us...
But to be serious I think it was just struggling with the amount of tags. I was prompting for 5 different characters. Yes, 5, and you can guess who the fifth is from pic related. That other gen failed to even get a hint of Kurisu in.

Anonymous
11/26/24(Tue)18:51:46 No.103317401

Anonymous 11/26/24(Tue)18:51:46 No.103317401

File: file.png (89 KB, 733x580)

89 KB PNG

>>103317138
>>103317123
>>103317069
>>103317058
>>103317020

>hearing y'all loud and clear! we have plans to explore context extension. with the two stage pretraining we have been using, we can pack all long context in Stage 2, so should be fairly economical.

Anonymous
11/26/24(Tue)18:52:19 No.103317403

Anonymous 11/26/24(Tue)18:52:19 No.103317403

>>103317200
The source is "Allen Thompson" from lifearchitect.ai

It's the most retarded LARPer ever. He is a literally who that pretends he is some AI expert insider, look at his fucking website for fucks sake.

Anonymous
11/26/24(Tue)18:56:19 No.103317433

Anonymous 11/26/24(Tue)18:56:19 No.103317433

>>103317401
Ok cool. Vramlets might be saved. Seemed far smarter than nemo in my testing and did ERP just fine.

Anonymous
11/26/24(Tue)18:57:07 No.103317444

Anonymous 11/26/24(Tue)18:57:07 No.103317444

>>103317355
Are you using a controlnet or regional prompter? Much editing/inpainting before the final product?

Anonymous
11/26/24(Tue)18:58:55 No.103317454

Anonymous 11/26/24(Tue)18:58:55 No.103317454

>>103317401
>hearing y'all loud and clear!
cringe

Anonymous
11/26/24(Tue)19:01:22 No.103317475

Anonymous 11/26/24(Tue)19:01:22 No.103317475

File: 2024-11-13_222626_seed280(...).png (637 KB, 1536x1536)

637 KB PNG

>>103317444
I'm mostly only interested in seeing what cool/dumb things the AI will spit out so I almost never use stuff like that and basically most of my stuff is unedited. At most, I do some doodling and img2img/inpainting which is how I created pic related.

Anonymous
11/26/24(Tue)19:03:11 No.103317492

Anonymous 11/26/24(Tue)19:03:11 No.103317492

I'm trying to use Tulu with llama 1B and get
>tgt: bos = 128000 (1), eos = 12801 (0)
>dft: bos = 128000 (1), eos = 128009 (0)
>draft model ... Is not compatible with target model ...
What gives? People said this worked but it seems the tokens have different ids.

Anonymous
11/26/24(Tue)19:05:11 No.103317510

Anonymous 11/26/24(Tue)19:05:11 No.103317510

>>103317433
>did ERP just fine
You must be a quick shot to be done before hitting that context limit.

Anonymous
11/26/24(Tue)19:05:19 No.103317514

Anonymous 11/26/24(Tue)19:05:19 No.103317514

>>103317355
I have never seen the orange one before. Are they reproducing through lesbian mating?

Anonymous
11/26/24(Tue)19:06:14 No.103317526

Anonymous 11/26/24(Tue)19:06:14 No.103317526

>>103317510
Just played with how it wrote explicit scenes is all that means.

Anonymous
11/26/24(Tue)19:10:03 No.103317566

Anonymous 11/26/24(Tue)19:10:03 No.103317566

>>103314654
Use git bisect.

Anonymous
11/26/24(Tue)19:10:38 No.103317571

Anonymous 11/26/24(Tue)19:10:38 No.103317571

Vanilla Tulu Q6K got my music theory question right, but i1 and abliterated screwed it up.

Anonymous
11/26/24(Tue)19:12:25 No.103317588

Anonymous 11/26/24(Tue)19:12:25 No.103317588

>>103317571
Alliteration always causes brain damage on models I've noticed. Tulu does not need it imo though. Just feed it a little context or a system prompt like everything else.

Anonymous
11/26/24(Tue)19:30:59 No.103317709

Anonymous 11/26/24(Tue)19:30:59 No.103317709

Just woke up from Cryosleep. Gonna try Tulu. How much context does it support? I cant find documentation anywhere.

Anonymous
11/26/24(Tue)19:31:55 No.103317719

Anonymous 11/26/24(Tue)19:31:55 No.103317719

>>103317709
Should be the same as llama 3.1, 128K

Anonymous
11/26/24(Tue)19:32:40 No.103317725

Anonymous 11/26/24(Tue)19:32:40 No.103317725

>>103317566
does git include a way to combine all those .safetensors files into one file?

I made my own program to do it already, but still wondering.

Anonymous
11/26/24(Tue)19:33:46 No.103317731

Anonymous 11/26/24(Tue)19:33:46 No.103317731

>>103317725
...

Anonymous
11/26/24(Tue)19:34:53 No.103317737

Anonymous 11/26/24(Tue)19:34:53 No.103317737

>>103317062
There's a disconnect between trying to bench grind (usually 1-shot, 4K context more than enough) and trying to make a model that can hold a conversation for hours and hours. The only real intersect is in summarization of large documents and/or needle/haystack style tasks.
Academia mostly only cares about the former. Coomers about the latter.
The exponential resources increase over context length part is fucking us over big time.

Anonymous
11/26/24(Tue)19:36:28 No.103317745

Anonymous 11/26/24(Tue)19:36:28 No.103317745

>>103317725
git bisect is for finding bugs/regressions in code. I was talking about using it on the llama.cpp codebase to find what commit caused anon's slowdown.

Anonymous
11/26/24(Tue)19:42:38 No.103317797

Anonymous 11/26/24(Tue)19:42:38 No.103317797

>>103316524
>"looks like I'm the kind of model that makes a lot of mistakes"
>causing the model to start larping as dumber than it is
Holy shit that's funny

Anonymous
11/26/24(Tue)19:43:01 No.103317800

Anonymous 11/26/24(Tue)19:43:01 No.103317800

Is a 4090 ideal? Or is there a better GPU built for local models. Sorry, I haven't built a PC in a while and I'm looking for just a basic idea of how things are right now.

Anonymous
11/26/24(Tue)19:45:31 No.103317816

Anonymous 11/26/24(Tue)19:45:31 No.103317816

>>103317588
Probably true. I was just checking them all since I have a little time tonight and I'm trying to put together a more deterministic and reliable set of tests than what I was doing before.

Now looking into some RP (not ERP), and it's gone a few turns without being immediately dumb, which is a hopeful sign. I'd much rather a good Q6K than a lobotomized IQ3 on Largestral.

Anonymous
11/26/24(Tue)19:46:24 No.103317828

Anonymous 11/26/24(Tue)19:46:24 No.103317828

File: fat migu drinking coffee.jpg (97 KB, 525x768)

97 KB JPG

>>103317800
A6000 or A6000 ADA

Anonymous
11/26/24(Tue)19:46:41 No.103317832

Anonymous 11/26/24(Tue)19:46:41 No.103317832

>>103317571
>i1
You mean an imatrix calibrated version of Q6K or something else?

Anonymous
11/26/24(Tue)19:46:42 No.103317833

Anonymous 11/26/24(Tue)19:46:42 No.103317833

What is a decent model that can do near-real-time for conversation on desktop hardware?

Anonymous
11/26/24(Tue)19:47:25 No.103317847

Anonymous 11/26/24(Tue)19:47:25 No.103317847

How can I see token generation speed with llama-server?

Anonymous
11/26/24(Tue)19:47:42 No.103317855

Anonymous 11/26/24(Tue)19:47:42 No.103317855

>>103317800
No, use the desktop GPU stuff the A6000 compute GPUs aren't even better (usually) and are probably made for use as like 1 of 10,000 in a large compute cluster.

Anonymous
11/26/24(Tue)19:48:54 No.103317862

Anonymous 11/26/24(Tue)19:48:54 No.103317862

>>103317797
it is funny yeah, even the big commercial models do it and it seems fundamental to the fact of these things being essentially probability engines
idk what can be done about it

Anonymous
11/26/24(Tue)19:49:20 No.103317868

Anonymous 11/26/24(Tue)19:49:20 No.103317868

>>103317832
I was told that mradermacher lists his imatrix editions as "i1" having planned ahead for an "i2" if the technique changed but never did.

Anonymous
11/26/24(Tue)19:50:33 No.103317884

Anonymous 11/26/24(Tue)19:50:33 No.103317884

>>103309106
>imaginary woman
>she's a retarded loudmouth
may as well stick to real women, they're already retarded loudmouths.

Anonymous
11/26/24(Tue)19:53:51 No.103317924

Anonymous 11/26/24(Tue)19:53:51 No.103317924

>>103317868
Well, I mean, ikwakrov (or something like that) abandoned the project and he was the one responsible for the quants.

Anonymous
11/26/24(Tue)19:54:14 No.103317929

Anonymous 11/26/24(Tue)19:54:14 No.103317929

>>103317868
Sorry I was referring to the Q6K part. Were you testing Q6K for all of them? I've always wondered if imatrix was actually a good thing or not. Bartowski seems to use imatrix by default for his models too.

Anonymous
11/26/24(Tue)19:55:08 No.103317942

Anonymous 11/26/24(Tue)19:55:08 No.103317942

>>103317922
>>103317922
>>103317922

Anonymous
11/26/24(Tue)19:56:34 No.103317951

Anonymous 11/26/24(Tue)19:56:34 No.103317951

>>103317924
Zero relation between mradermacher choosing a name for his quants and ikawrakow

Anonymous
11/26/24(Tue)19:56:41 No.103317953

Anonymous 11/26/24(Tue)19:56:41 No.103317953

>>103317833
Without gpu? And is "syntactically correct sentences" good enough? I like olmoe instruct for ridiculously fast. Haven't tried llama3.2-1b, but it's probably fine as well. Olmoe has little context. Llama has a lot. If you want textbook stuff, phi-mini may be fine as well.
ibm also released the granite moe models. They're faster than olmoe, but dumber.
If you have pretty much any gpu, any 8B model will be fast enough. A few seconds at most.
>https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

Anonymous
11/26/24(Tue)19:58:35 No.103317973

Anonymous 11/26/24(Tue)19:58:35 No.103317973

File: speed.png (4 KB, 674x158)

4 KB PNG

>>103317847
In the terminal where you launched it when it's done generating. Or use llama-bench.

Anonymous
11/26/24(Tue)19:59:17 No.103317975

Anonymous 11/26/24(Tue)19:59:17 No.103317975

>>103317953
nice, thanks

Anonymous
11/26/24(Tue)20:02:45 No.103318012

Anonymous 11/26/24(Tue)20:02:45 No.103318012

>>103317855
Alright, I'll look into stuff. I'm listening to a video right now.

Anonymous
11/26/24(Tue)20:04:46 No.103318025

Anonymous 11/26/24(Tue)20:04:46 No.103318025

how the fuck did I break my instruct mode?

before in koboldcpp I could just start a new session and type "You are an expert C++ programmer that is going to give me free advice." And it would play along.

Wizard 30b uncensored

Anonymous
11/26/24(Tue)20:06:28 No.103318039

Anonymous 11/26/24(Tue)20:06:28 No.103318039

>>103317929
All three were Q6K, though the vanilla was on bartowski.

Speaking without any science, I have positive memories of i1's, but that's shaken by this test. Maybe it's just the wrong set up for this model. I never paid much attention except that I figured it probably helped strong quant jobs like IQ3.

Anonymous
11/26/24(Tue)20:07:43 No.103318052

Anonymous 11/26/24(Tue)20:07:43 No.103318052

>>103318025
>Wizard 30b uncensored
What is it with all these time travelers from the past?

Anonymous
11/26/24(Tue)20:09:09 No.103318067

Anonymous 11/26/24(Tue)20:09:09 No.103318067

>>103318025
For coding use qwen2.5 32B coder

Anonymous
11/26/24(Tue)20:09:10 No.103318068

Anonymous 11/26/24(Tue)20:09:10 No.103318068

>>103318039
>though the vanilla was on bartowski.
bartowski uploads imatrix quants tho
>All quants made using imatrix option with dataset from here

Anonymous
11/26/24(Tue)20:09:46 No.103318074

Anonymous 11/26/24(Tue)20:09:46 No.103318074

>>103317973
Oh I see, the gen has to complete while I kept cancelling them.

Anonymous
11/26/24(Tue)20:10:11 No.103318076

Anonymous 11/26/24(Tue)20:10:11 No.103318076

>>103318039
>>103318068
Yeah this is weird. Maybe mrader is using a worse calibration dataset or maybe his dataset is optimized for other things that didn't happen to give good results on the tests you did.

Anonymous
11/26/24(Tue)20:10:51 No.103318080

Anonymous 11/26/24(Tue)20:10:51 No.103318080

>>103318039
I had a bad experience with a I quant before and avoided them since. Was never sure if it was just a singular bad quant.

Anonymous
11/26/24(Tue)20:10:57 No.103318081

Anonymous 11/26/24(Tue)20:10:57 No.103318081

>>103318052
They see /lmg/ on the upswing and all come crawling back

Anonymous
11/26/24(Tue)20:12:50 No.103318091

Anonymous 11/26/24(Tue)20:12:50 No.103318091

>>103317951
I was talking about the motive that there was never released a new "imatrix version", but I guess you are talking about "edition" as in him using another calibration dataset, sorry.

Anonymous
11/26/24(Tue)20:14:10 No.103318104

Anonymous 11/26/24(Tue)20:14:10 No.103318104

>>103318068
>bartowski
The only time he uploads static quants is when it's under the lmstudio account
https://huggingface.co/lmstudio-community/Llama-3.1-Tulu-3-70B-GGUF

Anonymous
11/26/24(Tue)20:16:56 No.103318128

Anonymous 11/26/24(Tue)20:16:56 No.103318128

>>103318068
Probably, I grabbed what showed up near the top of the HF search. But I'm out of time to horse around with testing a bunch more models so maybe this weekend I'll compare against bartowski imats.

>>103318080
IQ or i1? I'm vramlet so IQ3 has done some lifting for me.

Anonymous
11/26/24(Tue)20:31:49 No.103318261

Anonymous 11/26/24(Tue)20:31:49 No.103318261

I support OLMO because at least they are honest about their model working in 4k ctx range. Not like other companies that say it is 32k but shit falls apart after 2 messages.

Anonymous
11/26/24(Tue)20:34:17 No.103318277

Anonymous 11/26/24(Tue)20:34:17 No.103318277

i'm thinking about just becoming journey-pilled and continuing using tulu. actually uses humor. decent conversationally as well. wish the other prose didn't have SO much slop.

Anonymous
11/26/24(Tue)20:45:35 No.103318356

Anonymous 11/26/24(Tue)20:45:35 No.103318356

>>103318067
I'm not coding it was just an example. The models don't do this anymore they react totally differently and it's probably due to some change in koboldCPP or something, either that or my configuration.

I need to be able to issue simple instructions like that to them.

Anonymous
11/26/24(Tue)20:48:01 No.103318374

Anonymous 11/26/24(Tue)20:48:01 No.103318374

I dunno it may just be the honeymoon phase but after trying tulu it is slopped but at the same time it has weirdly natural sounding smut.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.