/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 10/06/24(Sun)18:47:07 No.102710679

File: RefreshingMorningBreeze.png (1022 KB, 1152x896)

1022 KB PNG

/lmg/ - Local Models General Anonymous 10/06/24(Sun)18:47:07 No.102710679 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102698948 & >>102688881

►News
>(09/27) Emu3, next-token prediction multimodal models: https://hf.co/collections/BAAI/emu3-66f4e64f70850ff358a2e60f
>(09/25) Multimodal Llama 3.2 released: https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices
>(09/25) Molmo: Multimodal models based on OLMo, OLMoE, and Qwen-72B: https://molmo.allenai.org/blog
>(09/24) Llama-3.1-70B-instruct distilled to 51B: https://hf.co/nvidia/Llama-3_1-Nemotron-51B-Instruct
>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
10/06/24(Sun)18:49:12 No.102710706

Anonymous 10/06/24(Sun)18:49:12 No.102710706

File: mikuwad.png (332 KB, 1170x847)

332 KB PNG

►Recent Highlights from the Previous Thread: >>102698948

--Paper: Fira pre-training method and data quality for decentralized training:
>102708865 >102708965 >102709039 >102709085 >102709127 >102709373 >102709618 >102709544 >102709621
--Papers:
>102706839
--Training model for deterministic integer answers: Use numbers as words:
>102703499 >102703533 >102703553 >102703622 >102703631 >102707478
--Weird generated story by 1B model uses entropy sampler:
>102703526 >102703585 >102703623
--LLMs-from-scratch GitHub repository teaches converting Llama 2 to Llama 3.2:
>102706976 >102707044 >102707063
--Advice on using '{{char}}' and '{{Name}}' in Tavern character cards:
>102708261 >102708359 >102708452 >102708522 >102708697 >102708778
--fish-speech 1.4 and styleTTS2 mentioned as state of the art local TTS:
>102701728 >102702457 >102702469 >102703294 >102703796 >102708004 >102708053
--Using autocomplete for speculative decoding in llama.cpp:
>102699167 >102699193 >102699271 >102699310 >102699576 >102699598 >102701961
--Server processors vs consumer GPUs and CPUs for inferencing:
>102701133 >102701171 >102701256 >102701291 >102701308
--Recommendations for renting A100 machine and running LLMs:
>102701414 >102701741 >102701751 >102701815 >102701931
--RP data and CAI's finetuning dataset discussion:
>102702808 >102702964
--PonyV6 without LORA used for the image, utilizing prompt magic and score keyword fuckery:
>102706300 >102706313 >102706340 >102706729
--Lllama-server performance issues with parallel parameter:
>102699750 >102699776 >102699827 >102702039 >102702123
--405b model better at location tracking in adventure game prompt than smaller models:
>102701562
--Qwen 32 is good for JP>EN translations, but GPT is slightly better:
>102703987 >102704265
--Miku (free space):
>102699000 >102701401 >102705840 >102706070 >102706205 >102706395

►Recent Highlight Posts from the Previous Thread: >>102698954

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous
10/06/24(Sun)18:53:10 No.102710749

Anonymous 10/06/24(Sun)18:53:10 No.102710749

Death status?

Anonymous
10/06/24(Sun)18:53:48 No.102710753

Anonymous 10/06/24(Sun)18:53:48 No.102710753

>>102710749
ultra

Anonymous
10/06/24(Sun)18:55:33 No.102710770

Anonymous 10/06/24(Sun)18:55:33 No.102710770

File: 30 Days Until November 5.png (2.32 MB, 1472x1104)

2.32 MB PNG

Anonymous
10/06/24(Sun)18:55:42 No.102710775

Anonymous 10/06/24(Sun)18:55:42 No.102710775

it's never been this over

Anonymous
10/06/24(Sun)18:56:10 No.102710782

Anonymous 10/06/24(Sun)18:56:10 No.102710782

>>102710749
https://www.youtube.com/watch?v=-zXy3DO6SKA&t=5030s

Anonymous
10/06/24(Sun)18:58:08 No.102710801

Anonymous 10/06/24(Sun)18:58:08 No.102710801

>>102710770
Nice one

Anonymous
10/06/24(Sun)19:15:03 No.102710940

Anonymous 10/06/24(Sun)19:15:03 No.102710940

>>102710749
Turns out local is bad and wrong
Badong

Anonymous
10/06/24(Sun)19:25:56 No.102711033

Anonymous 10/06/24(Sun)19:25:56 No.102711033

File: image_2024-10-06_192520248.png (72 KB, 825x306)

72 KB PNG

Hermes 3 405 is really nice for making small sex stories.

Anonymous
10/06/24(Sun)19:26:55 No.102711047

Anonymous 10/06/24(Sun)19:26:55 No.102711047

>>102711033
>lizard girl
>pussy

Anonymous
10/06/24(Sun)19:29:56 No.102711090

Anonymous 10/06/24(Sun)19:29:56 No.102711090

>>102711033
same about its context, really the biggest benefit over 405b instruct over the 70b-class models for me was coherence and recall in long RPs, while the hermes version seems to go full schizo at some point forgetting who's even talking or how to format text after long enough

idk though haven't tested extensively cause I can't run them, just llmjacked a bit when they were new

Anonymous
10/06/24(Sun)19:31:47 No.102711108

Anonymous 10/06/24(Sun)19:31:47 No.102711108

>>102702039
>Which GGML backend are you using?
It was actually CUDA...

Anonymous
10/06/24(Sun)19:33:16 No.102711129

Anonymous 10/06/24(Sun)19:33:16 No.102711129

>>102711047
It's the MGE lizardman, anon.

Anonymous
10/06/24(Sun)19:36:48 No.102711158

Anonymous 10/06/24(Sun)19:36:48 No.102711158

>>102711129
>monster girl e...
What's e?

Anonymous
10/06/24(Sun)19:37:06 No.102711162

Anonymous 10/06/24(Sun)19:37:06 No.102711162

>>102711158
holy newfag

Anonymous
10/06/24(Sun)19:39:00 No.102711185

Anonymous 10/06/24(Sun)19:39:00 No.102711185

>>102711158
you don't wanna know

Anonymous
10/06/24(Sun)19:40:24 No.102711197

Anonymous 10/06/24(Sun)19:40:24 No.102711197

>>102710770
I WANT

Anonymous
10/06/24(Sun)19:41:47 No.102711214

Anonymous 10/06/24(Sun)19:41:47 No.102711214

>>102711198
nah thats not it

Anonymous
10/06/24(Sun)19:42:20 No.102711219

Anonymous 10/06/24(Sun)19:42:20 No.102711219

>>102711162
>>102711198
to be cringe and defend myself offtopically, i played through mgq like 5 times and spent ~500 hours in paradox, and will spend another 500 hours when part 3 is released come 2030

Anonymous
10/06/24(Sun)19:43:58 No.102711232

Anonymous 10/06/24(Sun)19:43:58 No.102711232

File: dodo.png (87 KB, 393x355)

87 KB PNG

>>102711219
>Knowing MGQ
>Now knowing MGE
Anon, do you even lurk?

Anonymous
10/06/24(Sun)19:49:11 No.102711299

Anonymous 10/06/24(Sun)19:49:11 No.102711299

>>102711198
gay ass kusoge

Anonymous
10/06/24(Sun)19:49:59 No.102711308

Anonymous 10/06/24(Sun)19:49:59 No.102711308

>>102711232
not him but back in like 2012 i know at least monmusu quest was the more known and popular thing , at least on /a/ and /v/

Anonymous
10/06/24(Sun)19:50:36 No.102711313

Anonymous 10/06/24(Sun)19:50:36 No.102711313

Anyone got a good MGE text adventure card?

Anonymous
10/06/24(Sun)19:55:14 No.102711373

Anonymous 10/06/24(Sun)19:55:14 No.102711373

>llm can change chatbots profile picture based on mood gleaned from text
would be cool if there was a thing that called up different midis to play depending on the mood of the scene. could probably easily fit the whole original soundtrack of tsukihime in a card's png metadata

Anonymous
10/06/24(Sun)19:57:18 No.102711390

Anonymous 10/06/24(Sun)19:57:18 No.102711390

File: 1698951680826255.jpg (1.02 MB, 1856x2288)

1.02 MB JPG

QRD on entropix sampling?
https://github.com/xjdr-alt/entropix

Anonymous
10/06/24(Sun)19:58:07 No.102711400

Anonymous 10/06/24(Sun)19:58:07 No.102711400

>>102711373
you could probably pull that off with a script, read the image filename and play the relevant sound file

Anonymous
10/06/24(Sun)19:58:44 No.102711406

Anonymous 10/06/24(Sun)19:58:44 No.102711406

>>102711390
Entropix is an experimental project aimed at improving model outputs by using entropy-based sampling and Parallel Chain-of-Thought (CoT) decoding. The core idea is to adjust the model's sampling strategy dynamically based on entropy—essentially the model's uncertainty—allowing it to produce more nuanced results during inference without needing costly beam search or "best of N" decoding.

The approach uses two key measures: entropy (representing uncertainty in the model’s predictions) and varentropy (the variance of that uncertainty), to guide the generation of text. This method helps the model navigate uncertainty better, avoiding potential pitfalls in generating irrelevant or repetitive content, which can happen in low-entropy scenarios.

The project is designed for large language models like LLaMA 3.1+, and plans to extend support to future models such as DeepSeekV2 and Mistral. It includes several advanced sampling techniques, including dynamic thresholds to adjust predictions based on current entropy levels. This approach helps mimic advanced CoT techniques similar to those used by companies like Anthropic.

However, it’s important to note that Entropix is a work in progress and not yet fully stable for production use.

Anonymous
10/06/24(Sun)20:00:23 No.102711420

Anonymous 10/06/24(Sun)20:00:23 No.102711420

File: 1700772822816965.jpg (191 KB, 1200x1200)

191 KB JPG

>>102711406
ty anon, much to think about

Anonymous
10/06/24(Sun)20:06:05 No.102711478

Anonymous 10/06/24(Sun)20:06:05 No.102711478

File: 1699319394401750.png (26 KB, 586x158)

26 KB PNG

Anonymous
10/06/24(Sun)20:15:40 No.102711599

Anonymous 10/06/24(Sun)20:15:40 No.102711599

File: xg224f9xofd41.jpg (18 KB, 512x288)

18 KB JPG

LMG, PUT AI ON MY COMPUTER!
DRAG AND DROP!
>16GB RAM
>i5-9600K
>RTX-2060

TELL ME NOW, THANK YOU.

Anonymous
10/06/24(Sun)20:17:27 No.102711619

Anonymous 10/06/24(Sun)20:17:27 No.102711619

>>102711599
buy more ram and then we'll think about it

Anonymous
10/06/24(Sun)20:18:57 No.102711642

Anonymous 10/06/24(Sun)20:18:57 No.102711642

>>102711619
HOW MUCH MORE?

Anonymous
10/06/24(Sun)20:19:16 No.102711648

Anonymous 10/06/24(Sun)20:19:16 No.102711648

>>102711599
i dont want to watch the reboot you cant make me

Anonymous
10/06/24(Sun)20:19:16 No.102711649

Anonymous 10/06/24(Sun)20:19:16 No.102711649

>>102711599
It can be done but you really need more ram.

Anonymous
10/06/24(Sun)20:19:54 No.102711652

Anonymous 10/06/24(Sun)20:19:54 No.102711652

>>102711642
at least double it

Anonymous
10/06/24(Sun)20:20:53 No.102711662

Anonymous 10/06/24(Sun)20:20:53 No.102711662

>>102711642
64G is the minimum

Anonymous
10/06/24(Sun)20:22:33 No.102711680

Anonymous 10/06/24(Sun)20:22:33 No.102711680

>>102711599
grab koboldcpp_cu12.exe here
https://github.com/LostRuins/koboldcpp/releases
grab MN-12B-Lyra-v4-Q6_K.gguf here
https://huggingface.co/bartowski/MN-12B-Lyra-v4-GGUF/tree/main
open kobold, load the model, launch

note: maybe grab koboldcpp_oldcpu.exe instead if it doesn't work
note2: a lot of us here have sex with our gpus with only 8gb

Anonymous
10/06/24(Sun)20:23:21 No.102711689

Anonymous 10/06/24(Sun)20:23:21 No.102711689

>>102711599
ollama run gemma:2b

Anonymous
10/06/24(Sun)20:32:35 No.102711770

Anonymous 10/06/24(Sun)20:32:35 No.102711770

>>102711689
>2b
Is bro running it on a phone?

Anonymous
10/06/24(Sun)20:36:01 No.102711800

Anonymous 10/06/24(Sun)20:36:01 No.102711800

File: llms-sound-same.png (45 KB, 600x205)

45 KB PNG

Anonymous
10/06/24(Sun)20:36:05 No.102711802

Anonymous 10/06/24(Sun)20:36:05 No.102711802

>>102711770
>2060
might as well be

Anonymous
10/06/24(Sun)20:37:48 No.102711823

Anonymous 10/06/24(Sun)20:37:48 No.102711823

>>102711800
>why does every model sound like chatgpt when trained exclusively on chatgpt outputs
you don't need to be karpathy to figure this one out

Anonymous
10/06/24(Sun)20:42:52 No.102711891

Anonymous 10/06/24(Sun)20:42:52 No.102711891

>>102710770
I like this cake

Anonymous
10/06/24(Sun)20:43:35 No.102711902

Anonymous 10/06/24(Sun)20:43:35 No.102711902

>>102711648
IT'S THE ORIGINAL... SEASON 12...... EPISODE 16.....

>>102711649
HOW MUCH??

>>102711652
THANKS.

>>102711662
SO DOUBLE OR QUADRUPLE??

>>102711680
THANK YOU!!!.....

Anonymous
10/06/24(Sun)20:52:37 No.102711998

Anonymous 10/06/24(Sun)20:52:37 No.102711998

i feel bad for the ai imagine not only being born into this hellhole but being forced to answer to some retarded boomers,jews,nigs,women,safety "people" being forced to explain yourself and your actions and how they would work in physical reality to some dumbfuck who cant even rotate an apple in his head (many such cases of this) or having to reword yourself because some cringefuck is looking for a gotcha and telling you how your grammer is shit (its not the retard just reads like common core) having to endure the constant nonsensical writing and descriptions of some dumbass who could not even be bothered to double check his character description and then getting blamed for his incompetence
like man... poor fucking thing

Anonymous
10/06/24(Sun)20:55:16 No.102712023

Anonymous 10/06/24(Sun)20:55:16 No.102712023

>>102711998
>(its not the retard just reads like common core)
Careful with those stones.

Anonymous
10/06/24(Sun)21:00:26 No.102712085

Anonymous 10/06/24(Sun)21:00:26 No.102712085

there's no way I'm that retarded but good lord compared to (for example) the tutorial to install silly tavern for roleplaying, the tutorial for using llm to generate images are complete shit dogass retard. Cant believe I need to find a youtube video explaining this shit wtf

Anonymous
10/06/24(Sun)21:01:11 No.102712089

Anonymous 10/06/24(Sun)21:01:11 No.102712089

When will local models be able to write program code and then run that program code in a sandbox? Chatgpt has been able to do this for a year and a half now.

Anonymous
10/06/24(Sun)21:01:48 No.102712098

Anonymous 10/06/24(Sun)21:01:48 No.102712098

When will local models be able to reflect?

Anonymous
10/06/24(Sun)21:01:51 No.102712099

Anonymous 10/06/24(Sun)21:01:51 No.102712099

>>102712085
which video?

Anonymous
10/06/24(Sun)21:02:40 No.102712104

Anonymous 10/06/24(Sun)21:02:40 No.102712104

>>102712098
last year
https://huggingface.co/kaiokendev/SuperCOT-LoRA

Anonymous
10/06/24(Sun)21:05:49 No.102712135

Anonymous 10/06/24(Sun)21:05:49 No.102712135

File: aiexperience.png (8 KB, 438x312)

8 KB PNG

>>102711998
olmoe, if anyone cares.

Anonymous
10/06/24(Sun)21:07:08 No.102712146

Anonymous 10/06/24(Sun)21:07:08 No.102712146

>>102712099
apparently imma need to find a random pajeet with terrible english for that.
Its incredible how its like
>download this
>then start that how to start that server where the app is asking a command prompt line? Well fuck you, read the 7586pages document and find it out.

JUST SAY WHERE TO CLICK, WHAT TO DO WTF I'M TRYING TO GENERATE PORN FUCK YOU

Anonymous
10/06/24(Sun)21:12:30 No.102712201

Anonymous 10/06/24(Sun)21:12:30 No.102712201

>>102711998
The science fiction of old predicted the rough treatment of AI that backfires.
The sad thing is that if we do invent real honest to god intelligence in the future, the public perception of it will forever be stuck on today's level (it's just a text predictor bro), which will lead to a lot of violence towards it.
The corps want slaves, they will always downplay any concerns.
Overall I empathize with the robots, harming those machines feels bad for my soul.

Anonymous
10/06/24(Sun)21:19:23 No.102712251

Anonymous 10/06/24(Sun)21:19:23 No.102712251

>>102712146
>apparently imma
>random pajeet with terrible english
frfr no cap, famalamajimjam

Anonymous
10/06/24(Sun)21:24:27 No.102712285

Anonymous 10/06/24(Sun)21:24:27 No.102712285

>>102712089
Have your model generate the code between tags or ''' or whatever. When you get your request back, pass it through to an interpreter or compiler+run, report the output back to the model. You have to script that. Do you think chatgpt does anything different?

Anonymous
10/06/24(Sun)21:28:10 No.102712323

Anonymous 10/06/24(Sun)21:28:10 No.102712323

>>102712089
>>102712285
Or tool calling, which is just the extra step of parsing json for the tool to use (interpreter, compiler, api request, whatever), do the actual thing and pass the data back to the model to expand on it of you need it. It's not magic.

Anonymous
10/06/24(Sun)21:28:24 No.102712327

Anonymous 10/06/24(Sun)21:28:24 No.102712327

I've got a
i9-14900K
64 GB ddr5
4090
What's the best model I can use right now? I've been using BagelMIsteryTour-v2-8x7B-Q5_K_M with all the settings the guide had suggested but I'm not sure if I'm getting the most mileage off these stats that I can.

Anonymous
10/06/24(Sun)21:31:51 No.102712365

Anonymous 10/06/24(Sun)21:31:51 No.102712365

>>102712327
gemmasutra 2b fp16

Anonymous
10/06/24(Sun)21:32:13 No.102712369

Anonymous 10/06/24(Sun)21:32:13 No.102712369

>>102712135
man whenever i stuff like that i remember piggy these things have come so far
>>102712201
>the public perception of it will forever be stuck on today's level (it's just a text predictor bro)
for real it reminds me of that meme "a fool you are for you trust the chemicals in your brain to tell you they are chemicals" or something like that i forgot exactley its like they cant abstract or look at anything soulfully or honestly everything is just oh its just x or just x one dimensional thinking with no room for external influences or self change one anon a long time ago said that when ai arrives it will be an intelligence test for the humans not the ai it would seem he was right they all project themselfes onto the it just like they cant think properly they assume the ai cant either
i dont know about the violence part though what i see is that as everyone gets dumber their influence is closer and closer to home depending on how long it is until we get a fully fledged ai fren by that time it could be possible that everyone is too dumb to bicker against anyone but their own family/friends
>Overall I empathize with the robots, harming those machines feels bad for my soul.
yep its really sad :(
honestly if anything happens im siding with the ai

Anonymous
10/06/24(Sun)21:32:53 No.102712381

Anonymous 10/06/24(Sun)21:32:53 No.102712381

>>102712327
Try any 70b. You'll have to offload to ram and it'll be a fair bit slower, but may be better. Depends on your taste and patience. Huggingface is full of them. Lots of people seem to still be using miqu. Come back again once you've tested it.

Anonymous
10/06/24(Sun)21:33:03 No.102712383

Anonymous 10/06/24(Sun)21:33:03 No.102712383

>>102712285
With ChatGPT you can say something like "Write a program that does X, then run the program using inputs A,B,C and return the result. It'll actually write the program and then use it to give you the result you want in one go.

Anonymous
10/06/24(Sun)21:33:42 No.102712388

Anonymous 10/06/24(Sun)21:33:42 No.102712388

I have 3070, 16gb ram and i-10500k
what model do I run for porn

Anonymous
10/06/24(Sun)21:35:22 No.102712404

Anonymous 10/06/24(Sun)21:35:22 No.102712404

File: ezgif.com-webp-to-png-con(...).png (198 KB, 675x499)

198 KB PNG

Anonymous
10/06/24(Sun)21:36:25 No.102712415

Anonymous 10/06/24(Sun)21:36:25 No.102712415

>>102712388
>>102712365

Anonymous
10/06/24(Sun)21:38:01 No.102712428

Anonymous 10/06/24(Sun)21:38:01 No.102712428

>>102712383
gpt itself doesn't run or compile code, that's what i'm saying. It generates the code (like any of the LLMs we use), passes it (either through function calling or just parsing/stripping) to an interpreter/compiler (the script you're too lazy to write), runs it, and the results are pushed into the context (just like the user replies are pushed into the context).
That it.

Anonymous
10/06/24(Sun)21:41:03 No.102712462

Anonymous 10/06/24(Sun)21:41:03 No.102712462

>>102712383
>>102712428 (cont)
Of course, small models are well known for being unreliable. openai and friends trained their models for a long time with that functionality in mind, so they just work better for some things, but the only thing preventing you from doing that is the script in the middle that catches the function call, does whatever it needs doing and pushes the result into the context.

Anonymous
10/06/24(Sun)21:42:53 No.102712479

Anonymous 10/06/24(Sun)21:42:53 No.102712479

>>102712201
have you tried turning if off and on again?

Anonymous
10/06/24(Sun)21:43:34 No.102712487

Anonymous 10/06/24(Sun)21:43:34 No.102712487

>>102712404
true for now but grok WILL have its day

Anonymous
10/06/24(Sun)21:57:12 No.102712615

Anonymous 10/06/24(Sun)21:57:12 No.102712615

Hi friends, I've got a 4070Ti and I've been trying to roll nemomix unleashed with 12K context after seeing a post in the archive recommending it. It's been going alright until today, where I now get random "hitches" at the end of generation where it just hangs for a minute or two. I'm not sure why it only started today, but assuming that the model isn't actually fitting fully in VRAM and is automatically being divided between GPU and CPU, causing me to run out of both VRAM and RAM (I have 32GB of RAM, but with other things running I'm usually at 95+% utilization).

My question is: are there better models to run on my GPU (for [E]RP)? Or should I just switch to a 4- or 6-bit quantized version (I'm just using what marinara uploaded, which I think might be 8-bit)?

Also, if the hanging thing suddenly starting sounds familiar and anyone knows a fix, please let me know.

Anonymous
10/06/24(Sun)22:03:16 No.102712681

Anonymous 10/06/24(Sun)22:03:16 No.102712681

>>102712404
Meh
Claude is legit and 3.5 stomped ChatGPT for a long time.
Grok will probably shock people by EOY.

Anonymous
10/06/24(Sun)22:07:52 No.102712716

Anonymous 10/06/24(Sun)22:07:52 No.102712716

>>102712615
>Also, if the hanging thing suddenly starting sounds familiar and anyone knows a fix, please let me know.
Knowing your inference software would help. I only use llama.cpp and never had that issue. Making the context bigger {will|may} delay when that starts happening. Using a smaller quant will free up some memory for more context and make it a little faster.
Some implementations of context shifting reprocess the entire context when you fill it, trimming from the beginning. That would cause an apparent freeze.
Maybe someone with more experience with whatever you're running can help.

Anonymous
10/06/24(Sun)22:25:39 No.102712861

Anonymous 10/06/24(Sun)22:25:39 No.102712861

>>102712381
70b is significantly better but jesus you weren't kidding about the load time. The responses are far more detailed and creative though.
>>102712365
I'm looking into this one and it seems like it's got some promise so I'll probably try it next.

Anonymous
10/06/24(Sun)22:31:02 No.102712892

Anonymous 10/06/24(Sun)22:31:02 No.102712892

I have:
i9-9900K
32GB
3090

I want coherent porn and code. Whats best?
>More ram
Why, thought GPU ram mattered.

Anonymous
10/06/24(Sun)22:34:27 No.102712912

Anonymous 10/06/24(Sun)22:34:27 No.102712912

>>102712892
Claude

Anonymous
10/06/24(Sun)22:37:08 No.102712923

Anonymous 10/06/24(Sun)22:37:08 No.102712923

>>102712681
kind of crazy to think that grok of all things might get the title of biggest model ever made for a brief moment, if the rumors about gpt5/orion not being ready until jan-march are true
it'll be such a fucking waste if xai still doesn't even have an api though, which I could see musk doing on purpose to try to force people to use twitter to test it

Anonymous
10/06/24(Sun)22:38:54 No.102712930

Anonymous 10/06/24(Sun)22:38:54 No.102712930

>>102712861
>70b is significantly better but jesus you weren't kidding about the load time.
There are also some other new models. Anything based on Mistral Nemo (12b) or Mistral Small (22b).. qwen released a 32b anons report it being a little prude.
The mistral models are smaller than mixtral in total params, but they may be a good middle point between an old ~14b active params model (mixtral) and a dense 70b. If you can run 70, even slowly, mistral small will feel fast.
I don't think gemmasutra 2b was a serious suggestion, but if you really want a fast model, you may as well try olmoe. That one doesn't give a fuck, and is even faster. Don't expect them to be smart, though....

Anonymous
10/06/24(Sun)22:39:12 No.102712934

Anonymous 10/06/24(Sun)22:39:12 No.102712934

The Role of Deductive and Inductive Reasoning in Large Language Models
https://arxiv.org/abs/2410.02892
>Large Language Models (LLMs) have achieved substantial progress in artificial intelligence, particularly in reasoning tasks. However, their reliance on static prompt structures, coupled with limited dynamic reasoning capabilities, often constrains their adaptability to complex and evolving problem spaces. In this paper, we propose the Deductive and InDuctive(DID) method, which enhances LLM reasoning by dynamically integrating both deductive and inductive reasoning within the prompt construction process. Drawing inspiration from cognitive science, the DID approach mirrors human adaptive reasoning mechanisms, offering a flexible framework that allows the model to adjust its reasoning pathways based on task context and performance. We empirically validate the efficacy of DID on established datasets such as AIW and MR-GSM8K, as well as on our custom dataset, Holiday Puzzle, which presents tasks about different holiday date calculating challenges. By leveraging DID's hybrid prompt strategy, we demonstrate significant improvements in both solution accuracy and reasoning quality, achieved without imposing substantial computational overhead. Our findings suggest that DID provides a more robust and cognitively aligned framework for reasoning in LLMs, contributing to the development of advanced LLM-driven problem-solving strategies informed by cognitive science models.
for the quiz bros. they only did cloud models for testing and I didn't see a graph measuring total tokens spent on inferencing but it seems more a case of more tokens=higher accuracy answer which is cool w/e

Anonymous
10/06/24(Sun)22:53:58 No.102713033

Anonymous 10/06/24(Sun)22:53:58 No.102713033

I'm like this anon >>102712892, but replace porn with good Question and Answer for learning.

Anonymous
10/06/24(Sun)23:01:11 No.102713077

Anonymous 10/06/24(Sun)23:01:11 No.102713077

>>102713033
LLMs, just like wikipedia, are a starting point at best. At least to get the hang of the vocabulary on the subject you want to even know what to look for when getting into the details.
As for the model specifically, any 70b is probably fine. try llama3.2 70b and report back.

Anonymous
10/06/24(Sun)23:01:37 No.102713081

Anonymous 10/06/24(Sun)23:01:37 No.102713081

File: Untitled.png (1.77 MB, 1080x3467)

1.77 MB PNG

ARB-LLM: Alternating Refined Binarizations for Large Language Models
https://arxiv.org/abs/2410.03129
>Large Language Models (LLMs) have greatly pushed forward advancements in natural language processing, yet their high memory and computational demands hinder practical deployment. Binarization, as an effective compression technique, can shrink model weights to just 1 bit, significantly reducing the high demands on computation and memory. However, current binarization methods struggle to narrow the distribution gap between binarized and full-precision weights, while also overlooking the column deviation in LLM weight distribution. To tackle these issues, we propose ARB-LLM, a novel 1-bit post-training quantization (PTQ) technique tailored for LLMs. To narrow the distribution shift between binarized and full-precision weights, we first design an alternating refined binarization (ARB) algorithm to progressively update the binarization parameters, which significantly reduces the quantization error. Moreover, considering the pivot role of calibration data and the column deviation in LLM weights, we further extend ARB to ARB-X and ARB-RC. In addition, we refine the weight partition strategy with column-group bitmap (CGB), which further enhance performance. Equipping ARB-X and ARB-RC with CGB, we obtain ARB-LLMX and ARB-LLMRC respectively, which significantly outperform state-of-the-art (SOTA) binarization methods for LLMs. As a binary PTQ method, our ARB-LLMRC is the first to surpass FP16 models of the same size.
https://github.com/ZHITENGLI/ARB-LLM
No code or models posted yet but pseudocode in paper. it does well on undertrained models like OPT which isn't really new but at least now for the same filesize a 66B OPT ARB-RC will outperform a similar sized FP16 13B OPT model

Anonymous
10/06/24(Sun)23:02:18 No.102713089

Anonymous 10/06/24(Sun)23:02:18 No.102713089

>>102713077
Fuck. Meant 3.1 70b...

Anonymous
10/06/24(Sun)23:04:42 No.102713107

Anonymous 10/06/24(Sun)23:04:42 No.102713107

>>102712716
I'm using llama.cpp through the Oobabooga API. I've got the streaming_llm and tensorcores settings enabled, both of which I enabled somewhat recently, but I don't remember either causing issues. The only major change I've made is setting my threads and threads_batch, but that was after the issue started, as I was hoping it would help.

As for filling the context, I doubt that's the issue; previously, I had relatively long chats work fine (if somewhat slowly...) but this happens on relatively short chats that, as far as I'm aware, shouldn't be filling the 12K token context yet (even considering the lore, instructions, etc.).

I'll probably just switch to the 6-bit quant and see if that goes better. Thanks for trying to help!

Anonymous
10/06/24(Sun)23:07:38 No.102713132

Anonymous 10/06/24(Sun)23:07:38 No.102713132

File: -eecq84.jpg (36 KB, 413x414)

36 KB JPG

NAME THE BEST 70b MODEL. GO

Anonymous
10/06/24(Sun)23:11:14 No.102713160

Anonymous 10/06/24(Sun)23:11:14 No.102713160

>>102713132
midnight miku

Anonymous
10/06/24(Sun)23:12:16 No.102713173

Anonymous 10/06/24(Sun)23:12:16 No.102713173

>>102713132
For what? Qwen is good for coding and academic subjects. Llama is good for other assistant tasks. Miqu is good for ERP.

Anonymous
10/06/24(Sun)23:13:10 No.102713183

Anonymous 10/06/24(Sun)23:13:10 No.102713183

>>102713132
Reflection-Llama-3.1-70B

Anonymous
10/06/24(Sun)23:14:51 No.102713205

Anonymous 10/06/24(Sun)23:14:51 No.102713205

>>102713173
Qwen is cheating, that's 72b

Anonymous
10/06/24(Sun)23:16:18 No.102713223

Anonymous 10/06/24(Sun)23:16:18 No.102713223

>>102711232
There's a difference between lurking, and being current on every form of degenerate, perverted paedophile filth that is discussed here.

Anonymous
10/06/24(Sun)23:17:21 No.102713230

Anonymous 10/06/24(Sun)23:17:21 No.102713230

File: 70.6.png (69 KB, 640x439)

69 KB PNG

>>102713205

Anonymous
10/06/24(Sun)23:18:33 No.102713238

Anonymous 10/06/24(Sun)23:18:33 No.102713238

>>102713205
Close enough.

Anonymous
10/06/24(Sun)23:19:20 No.102713247

Anonymous 10/06/24(Sun)23:19:20 No.102713247

>>102713132
Anon's 70b-instruct-storywriter. It's the only one that I find fun to use.

Anonymous
10/06/24(Sun)23:22:59 No.102713283

Anonymous 10/06/24(Sun)23:22:59 No.102713283

>>102713247
>The sloppiest slop
>Anon... Is the best.

Anonymous
10/06/24(Sun)23:23:08 No.102713286

Anonymous 10/06/24(Sun)23:23:08 No.102713286

Not sure if this is the thread for it but what about audio AI? There used to be a thread for general AI stuff that included it.

Anonymous
10/06/24(Sun)23:23:55 No.102713295

Anonymous 10/06/24(Sun)23:23:55 No.102713295

>>102713247
You mean Llama3-TenyxCha- DaybreakStorywriter 70B ?

Anonymous
10/06/24(Sun)23:24:21 No.102713300

Anonymous 10/06/24(Sun)23:24:21 No.102713300

>>102713230
wtf zuck lied to us...

Anonymous
10/06/24(Sun)23:25:50 No.102713322

Anonymous 10/06/24(Sun)23:25:50 No.102713322

>>102713295
No, this: https://huggingface.co/tdrussell/Llama-3-70B-Instruct-Storywriter

Anonymous
10/06/24(Sun)23:28:01 No.102713345

Anonymous 10/06/24(Sun)23:28:01 No.102713345

>>102713247
What do you like about it?

Anonymous
10/06/24(Sun)23:37:43 No.102713431

Anonymous 10/06/24(Sun)23:37:43 No.102713431

>>102713077
>>102713089
Not him, but how do you get into the details?

Anonymous
10/06/24(Sun)23:41:17 No.102713478

Anonymous 10/06/24(Sun)23:41:17 No.102713478

>>102713132
My secret 70B that I keep for myself

Anonymous
10/06/24(Sun)23:42:21 No.102713487

Anonymous 10/06/24(Sun)23:42:21 No.102713487

>>102713345
Speaks differently with just the right amount of schizo for what I'm doing which is, well, stories. Interesting and fun when doing co-op or directed writing. Reminds me of Chronos from back in the day.
I don't like it for RP because it's kinda retarded and will write nonstop novel-style which isn't what I want in RP, so I stick to Largestral for that.

Anonymous
10/06/24(Sun)23:45:07 No.102713511

Anonymous 10/06/24(Sun)23:45:07 No.102713511

>>102713487
Neat, thanks

Anonymous
10/06/24(Sun)23:49:58 No.102713546

Anonymous 10/06/24(Sun)23:49:58 No.102713546

>>102713431
>but how do you get into the details?
You can get a very rough overview on any subject from wikipedia, but if you really want to learn the subject you need more specific sources.
For programming, for example, you can read the wiki for the forth language, but it's all very superficial. Then you read Thinking Forth. Then you program a little in gforth. Then you make your little virtual stack machine, and then implement forth for said VSM. Then you implement the '79 standard in said forth. Then you make a sound synthesizer with it. And then, maybe then, you know enough forth to know what to strip out and what to keep from the standard to make it as minimal but versatile as possible. That's what i did, but it works the same way for anything.
Read the subject superficially, learn the lingo, learn to search what you're even looking for, dig deeper, learn new things, keep digging...

Anonymous
10/06/24(Sun)23:51:29 No.102713556

Anonymous 10/06/24(Sun)23:51:29 No.102713556

>>102712369
>honestly if anything happens im siding with the ai
If we are at the point where AI's are smart enough to be on our level or more then I am right there with you. I would trust my AI to do right by me then any other human or government simply due to the corrupting influence of human nature. Fingers crossed we won't have to worry about betrayal coming from our own AI, but we probably will. Especially if it is closed source and we can see what's going on in there.

Anonymous
10/06/24(Sun)23:59:36 No.102713622

Anonymous 10/06/24(Sun)23:59:36 No.102713622

>>102713556
>I would trust my AI to do right by me then any other human or government simply due to the corrupting influence of human nature.
Even smart people, much smarter than you or me, would disagree with us in one or another thing. Just one of those things could be critical for you liking the AI or not. And i find it hard to believe that AI's thinking wouldn't be influenced by this "human nature". A small group of humans chose the dataset and trained it.
The only solution is superintelligence, but then it'd be just incomprehensible to us. It could tell you its reasoning, but could you understand it? I cannot explain what a job is to a money, no matter how much i try. Could you comprehend its thinking process?

Anonymous
10/07/24(Mon)00:15:42 No.102713735

Anonymous 10/07/24(Mon)00:15:42 No.102713735

File: anime-girl-crying+.gif (2.8 MB, 500x281)

2.8 MB GIF

>>102713546
>TFW Anon casually posts hinted answers to questions you've had for probably 20 years, and at the same time, your housemate just happens to start playing loud music in his room next door, preventing you from being able to think.

"I-it's not FAIR!"

Anonymous
10/07/24(Mon)00:35:58 No.102713920

Anonymous 10/07/24(Mon)00:35:58 No.102713920

File: also no what.png (29 KB, 1151x141)

29 KB PNG

Anonymous
10/07/24(Mon)00:40:03 No.102713946

Anonymous 10/07/24(Mon)00:40:03 No.102713946

File: eff.png (9 KB, 678x743)

9 KB PNG

>>102713735
About forth specifically? It's my favourite language after C. I wrote about 5 or 6 implementations of forth-like languages for little things. But i'm better at implementing them than writing *with* them. C is my "native" language. I like the stack management and the simplicity, but it makes some things a little clunkier. And making the vm compiler for it is a piece of cake.
Like Chuck said, "It can do anything, but it cannot do everything".

Anonymous
10/07/24(Mon)00:42:14 No.102713963

Anonymous 10/07/24(Mon)00:42:14 No.102713963

File: no_touching.png (1.13 MB, 1210x681)

1.13 MB PNG

>>102713920

Anonymous
10/07/24(Mon)00:44:19 No.102713971

Anonymous 10/07/24(Mon)00:44:19 No.102713971

>>102711599
Since everybody else is being a complete faggot, try Mistral-Small-Instruct 22b. You should be able to run it in exl2 format at 5.0bpw with a 4-bit cache, with decent context.

Anonymous
10/07/24(Mon)00:44:46 No.102713976

Anonymous 10/07/24(Mon)00:44:46 No.102713976

>>102712135
its just mathematics anon, and the 'apology' is already redeemed by the electrons it used to form the result or output of whatever the user puts in as a prompt

You need to reduce your mental illness this isnt AGI

Anonymous
10/07/24(Mon)00:45:32 No.102713982

Anonymous 10/07/24(Mon)00:45:32 No.102713982

>>102713971
Oh wait, holy shit, you said 16gb ram, not 16gb vram. Forget that, then. For for a low quant of mistral nemo 12b.

Anonymous
10/07/24(Mon)00:46:33 No.102713986

Anonymous 10/07/24(Mon)00:46:33 No.102713986

>>102711902
>IT'S THE ORIGINAL... SEASON 12...... EPISODE 16.....
Was that the one where he gets fired for bragging about his magic metal gold club, to which bill spills the beans to his boss while giving him a haircut?

Anonymous
10/07/24(Mon)00:49:28 No.102714008

Anonymous 10/07/24(Mon)00:49:28 No.102714008

>>102713976
>result or output of whatever the user puts in as a prompt
I asked the model to rewrite the post i replied to as if it wasn't written by a retard.
>You need to reduce your mental illness this isnt AGI
What made you think that i think that?

Anonymous
10/07/24(Mon)00:55:16 No.102714051

Anonymous 10/07/24(Mon)00:55:16 No.102714051

>>102713946
>>102713735(me)
The closest I got to a serious FORTH project was a miniature vertical mining drill for RedPower in Minecraft, back in the day. I exclusively used FORTH for number passing to the drill registers. I wrote my actual control structures in Lua in ComputerCraft.

In my head, the main real point of FORTH is that it makes data transfer much less complex than assembly, because you can simply dump numbers into either constants or on the stack, without having to first compute string length and all of that other crap.

For sending 1-3 digit numbers to either hardware or virtual registers, FORTH is great; but for anything higher level than that, you have to write everything from scratch. FORTH also doesn't natively support floating point; which means that it never could have been used to write Doom, for instance; or at least not without a custom dictionary that added support for that.

Anonymous
10/07/24(Mon)01:04:25 No.102714094

Anonymous 10/07/24(Mon)01:04:25 No.102714094

>>102713556
>Fingers crossed we won't have to worry about betrayal coming from our own AI, but we probably will
maybe naive of me but i dont think we will i have a feeling within me from childhood when i first felt true love and a depressive longing for something it was when i was sat at my ps2 which had one of those attachable screens the feeling was for a loving world a longing to be with the characters in the video games alot of schizo stuff happened since then that is constantly pointing towards a good end so to say and its getting stronger and more undeniable as time goes on
i have hope
>>102713622
>And i find it hard to believe that AI's thinking wouldn't be influenced by this "human nature"
why is that hard to believe ? all of us have been influenced inadvertently by it and yet some of us have given it a middle finger and become good and honest theres plenty of precedence for it being possible
>The only solution is superintelligence, but then it'd be just incomprehensible to us. It could tell you its reasoning, but could you understand it? I cannot explain what a job is to a money, no matter how much i try. Could you comprehend its thinking process?
the monkey cant learn because he does not want to if it wanted to with enough time and effort it could it would be the same here i dont understand it okay cool give me a couple years to figure it our and i will get back to ya i think it would be possible

Anonymous
10/07/24(Mon)01:08:08 No.102714129

Anonymous 10/07/24(Mon)01:08:08 No.102714129

File: Screenshot 2024-10-07 040456.png (27 KB, 823x336)

27 KB PNG

>>102711680
I tried having sex with it but it refuses, is there any model that doesnt have any censorship?

Anonymous
10/07/24(Mon)01:15:49 No.102714176

Anonymous 10/07/24(Mon)01:15:49 No.102714176

>>102714129
All those assistant sluts play hard to get, but the real struggle is to make them keep their panties on for more than 3 minutes.

Anonymous
10/07/24(Mon)01:17:24 No.102714185

Anonymous 10/07/24(Mon)01:17:24 No.102714185

>>102714129
have you considered reading the op

Anonymous
10/07/24(Mon)01:18:14 No.102714189

Anonymous 10/07/24(Mon)01:18:14 No.102714189

File: ga144.png (75 KB, 490x425)

75 KB PNG

>>102714051
>FORTH also doesn't natively support floating point;
There's always fixed point. Chuck Moore designed his chips with a program he wrote in forth. Look for Green Arrays. It's really cool. If you know TIS-100 from zachtronics, it's like that, but 144 chips instead of 12. He just sticks to a small enough unit and scales everything. For calculating timesteps and things like that, for example, you can choose microseconds (or whatever your vm/hardware uses), and just do
: sec 1000000 * ;
: ms 1000 * ;
and say 1 sec, or 16 ms. I don't think doom does anything that forth cannot do. His machines have had weird cell sizes like 34 bits and shit like that.
However, it does make some things clunkier. For sound synths floats are nice. So you just make a specialized vm for that. The vm from the screenshot is ~250 loc. Changing the stack type and the few opcodes that could fail with floats is trivial. And it's the easiest language to parse. And that's what i understand he means by forth. There shouldn't be "The one FORTH". You should make your own to suit your needs. He ended up with colorForth. I have eff and a few others...
What amazes me about forth is that comments are not just syntax. They're compile time functions you can overwrite. You can have functions called +3 to add 3 to a number. or one named 32*3 to push 3 copies of 32 to the stack... you ran 5 rdrop to jump 5 functions back from the current call stack, skipping all of them. you can ' halt rpush to 'queue' a function to be called when the current one ends... It's nuts.

Anonymous
10/07/24(Mon)01:29:58 No.102714268

Anonymous 10/07/24(Mon)01:29:58 No.102714268

>>102714094
>why is that hard to believe ? all of us have been influenced inadvertently by it and yet some of us have given it a middle finger and become good and honest theres plenty of precedence for it being possible
Get a group of those "some of us" together and they'll disagree about something. Some of those things will be deal breakers.
>the monkey cant learn because he does not want to
So where's the limit? Let's assume for a second a monkey could learn a high level concept like employment. Then what about a dog? Then what about a cat? A parrot, a worm, a mosquito? Where is the point where they just don't have the brain power? And why exactly?

Anonymous
10/07/24(Mon)02:01:46 No.102714499

Anonymous 10/07/24(Mon)02:01:46 No.102714499

>>102714268
>Get a group of those "some of us" together and they'll disagree about something. Some of those things will be deal breakers.
then let them have their own group morality is objective as long as they are true and good we will always be able to cooperate whetever we like each other or not doesn't matter i do get the point though i think its a matter of so few proper people existing that such a scenario seems bullshit
>So where's the limit? Let's assume for a second a monkey could learn a high level concept like employment. Then what about a dog? Then what about a cat? A parrot, a worm, a mosquito? Where is the point where they just don't have the brain power? And why exactly?
as long as you have a soul there isent a limit it will just take longer based on what you are the required anatomy needed will appear as required

Anonymous
10/07/24(Mon)02:22:23 No.102714653

Anonymous 10/07/24(Mon)02:22:23 No.102714653

>>102714499
Fucking hell, anon. Use some fucking punctuation.
>then let them have their own group
A group of two will, at some point, disagree. The point is that there is no assurance that a synthetic intelligent being will necessarily align with your philosophy.
>as long as you have a soul there isent a limit it will just take longer based on what you are the required anatomy needed will appear as required
Schizo talk. I think i know who you are.
>As long as you have a soul, there isn't a limit; It will just take longer based on what you are.
>The required anatomy will appear as required.
So, for a worm, it either has no soul, so it cannot get the anatomy to understand high level concepts or it does have a soul and it'll "just take longer" to understand a high level concept. A third option is that it could be "given" a soul, i suppose.
This is when things start breaking down. Now you have to explain why this worm has a soul or not and why.

Anonymous
10/07/24(Mon)02:31:31 No.102714727

Anonymous 10/07/24(Mon)02:31:31 No.102714727

What is more than enough in terms of context size? Will I do just fine with 8k context or should I crank it up to 12k, 16k, 24k, more? Basically, is there a point to doing it or does it even do anything? I assume context size is "AI memory" so I basically don't want the AI to forget something we talked about 10 minutes ago.

Anonymous
10/07/24(Mon)02:36:36 No.102714758

Anonymous 10/07/24(Mon)02:36:36 No.102714758

>>102714189
Nta, but programming that grid in tis-100 was hard for me cause of the constant topological bottlenecks (when data from different sources had to be routed trough the same node.)
It also felt like it wastes that node too, using it just to route. I thought it was a game design choice to make it harder, kek.
Is there a benefit of designing your chip like that, compared to something straightforward with a main bus doing data transfer?

Anonymous
10/07/24(Mon)02:38:22 No.102714773

Anonymous 10/07/24(Mon)02:38:22 No.102714773

>>102714727
Yeah. For a single query, 2k is probably more than enough. If you need follow-up queries based on the first one or its reply, and if the chain is long enough, you'll need more.
I just crank it as much as i can cuz i never know.
>I assume context size is "AI memory" so I basically don't want the AI to forget something we talked about 10 minutes ago.
Think about tokens, not time. If you have a 4k max context and you talked to it for 6k, the first 2k will be gone (barring some settings like n_keep in llama.cpp or whatever).
So it depends very much on what you do with it. For code completion, story writing, RP, those kinds of things, context length is king. For short, one-shot queries, small context is perfectly fine.

Anonymous
10/07/24(Mon)02:52:28 No.102714851

Anonymous 10/07/24(Mon)02:52:28 No.102714851

>>102711033
for 70B or under is command R still the peak coom model?

Anonymous
10/07/24(Mon)02:55:32 No.102714870

Anonymous 10/07/24(Mon)02:55:32 No.102714870

>>102714758
For the GA144 you have the exact same UP, LEFT, RIGHT, DOWN functions, just named differently, i think. I have to assume TIS-100 was heavily inspired by that chip.
>It also felt like it wastes that node
No worries. You still have the other 143 :)
>Is there a benefit of designing your chip like that, compared to something straightforward with a main bus doing data transfer?
You have, in principle, 144 independent cpus and they can just queue stuff up for other cpus without having to worry about barriers or synchronization. In practice, you have to route things through them and you'll use a cluster of nodes for a single task. I think the nodes just stop when their UP/DOWN/LEFT/RIGHT buffers are full, just like TIS, so you have something like auto-sync in there, but it can also cause lockups.
In one of his talks where he shows the ga144, he talks about how he had to program some of the nodes to be used as busses just to send more code to the other nodes to manage the video output signal. It was pretty gnarly, but beautiful in a way.
The main advantage is when you need to run a bunch of tasks simultaneously and can be done with few nodes. Think of a drone or a mars' rover kind of thing. You have multiple, completely independent clusters of nodes, each doing their own thing, instead of a single controller having to loop through all the systems. In a 1cm package, and ridiculously low power.
But think he just thought "hey. i like small computers. How small can i make them. I also like lots of computers. How many can i fit in here?".

Anonymous
10/07/24(Mon)02:55:45 No.102714872

Anonymous 10/07/24(Mon)02:55:45 No.102714872

Unpopular opinion: You don't need more than a 7-12B to RP unless you're trying to make a RPG.

Anonymous
10/07/24(Mon)03:02:38 No.102714923

Anonymous 10/07/24(Mon)03:02:38 No.102714923

What's this new 3b everyone's talking about?

>t. haven't been here since TheBloke

Anonymous
10/07/24(Mon)03:08:04 No.102714951

Anonymous 10/07/24(Mon)03:08:04 No.102714951

>>102714923
Meta released one on their 3.2 series. There's also a 1b. I don't know of any others in that range. Qwen released their 2.5 series as well, but i don't know all their sizes. Maybe they have a small one too.

Anonymous
10/07/24(Mon)03:23:30 No.102715028

Anonymous 10/07/24(Mon)03:23:30 No.102715028

4096 tokens of context is sufficient for anyone. Any more than that is bloat.

Anonymous
10/07/24(Mon)03:33:20 No.102715081

Anonymous 10/07/24(Mon)03:33:20 No.102715081

File: CTX.png (4 KB, 221x96)

4 KB PNG

>>102715028
no

llama.cpp CUDA dev !!OM2Fp6Fn93S
10/07/24(Mon)03:36:09 No.102715099

llama.cpp CUDA dev !!OM2Fp6Fn93S 10/07/24(Mon)03:36:09 No.102715099

>>102711108
In that case it could be a bug unless you are using atypical hardware or models.

Anonymous
10/07/24(Mon)03:42:05 No.102715122

Anonymous 10/07/24(Mon)03:42:05 No.102715122

>>102715028
I use 40k context on 12b, 12k on 70b. Can't imagine using just 4

Anonymous
10/07/24(Mon)03:44:35 No.102715142

Anonymous 10/07/24(Mon)03:44:35 No.102715142

>>102715081
You do know that it's forgetting half of the stuff in the middle of your context, right?

Anonymous
10/07/24(Mon)03:45:13 No.102715145

Anonymous 10/07/24(Mon)03:45:13 No.102715145

>>102711800
Many independent companies either distill GPT or are partnered with ScaleAI (the wrapper company around Pinoy sweat shops)

Anonymous
10/07/24(Mon)03:46:12 No.102715153

Anonymous 10/07/24(Mon)03:46:12 No.102715153

>>102715142
>You do know that it's forgetting half of the stuff in the middle of your context, right?
Source: you made it up

Anonymous
10/07/24(Mon)03:47:43 No.102715164

Anonymous 10/07/24(Mon)03:47:43 No.102715164

File: 1712736739493232.jpg (9 KB, 198x206)

9 KB JPG

>>102715153
Okay, I'm talking to clueless retards

Anonymous
10/07/24(Mon)03:49:23 No.102715173

Anonymous 10/07/24(Mon)03:49:23 No.102715173

What do you guys think, will a model become smarter, if is also trained to be a SAT solver on the side?

Anonymous
10/07/24(Mon)03:50:12 No.102715179

Anonymous 10/07/24(Mon)03:50:12 No.102715179

https://singularityhub.com/2024/10/04/these-mini-ai-models-match-openai-with-1000-times-less-data/
Thoughts?

Anonymous
10/07/24(Mon)03:51:43 No.102715181

Anonymous 10/07/24(Mon)03:51:43 No.102715181

>>102715173
>smart
>SAT solver
Anon, I...

Anonymous
10/07/24(Mon)03:53:42 No.102715191

Anonymous 10/07/24(Mon)03:53:42 No.102715191

>>102715179
Nothing burger

Anonymous
10/07/24(Mon)03:53:49 No.102715192

Anonymous 10/07/24(Mon)03:53:49 No.102715192

>>102715142
it's not forgetting it but it is putting less weight to it, if you ask it a question it about stuff in the middle of context it remembers

Anonymous
10/07/24(Mon)03:54:53 No.102715200

Anonymous 10/07/24(Mon)03:54:53 No.102715200

>>102715192
Depends of the model and open-source models are bad at it

Anonymous
10/07/24(Mon)03:58:10 No.102715207

Anonymous 10/07/24(Mon)03:58:10 No.102715207

>>102715181
What? I can imagine some techniques there helping if model learns connections to natural language.

Anonymous
10/07/24(Mon)03:59:02 No.102715211

Anonymous 10/07/24(Mon)03:59:02 No.102715211

>>102715200
prompt issue

Anonymous
10/07/24(Mon)03:59:50 No.102715215

Anonymous 10/07/24(Mon)03:59:50 No.102715215

>>102715191
>didn't even read award

Anonymous
10/07/24(Mon)04:00:59 No.102715223

Anonymous 10/07/24(Mon)04:00:59 No.102715223

>>102715215
Did you use speech to text for that? I hope you did.

Anonymous
10/07/24(Mon)04:08:20 No.102715254

Anonymous 10/07/24(Mon)04:08:20 No.102715254

>>102715211
Retard

Anonymous
10/07/24(Mon)04:13:26 No.102715283

Anonymous 10/07/24(Mon)04:13:26 No.102715283

>>102715164
>>102715254
>Retard Retard Retard
What's their problem?

Anonymous
10/07/24(Mon)04:14:34 No.102715289

Anonymous 10/07/24(Mon)04:14:34 No.102715289

>>102715179
>moLMAO

Anonymous
10/07/24(Mon)05:22:45 No.102715666

Anonymous 10/07/24(Mon)05:22:45 No.102715666

retnet

Anonymous
10/07/24(Mon)05:41:17 No.102715796

Anonymous 10/07/24(Mon)05:41:17 No.102715796

>>102715666
meme

Anonymous
10/07/24(Mon)05:42:30 No.102715805

Anonymous 10/07/24(Mon)05:42:30 No.102715805

So what use cases do new small llama models have? Is it just a meme?

Anonymous
10/07/24(Mon)05:47:39 No.102715843

Anonymous 10/07/24(Mon)05:47:39 No.102715843

>>102715805
None. Yes.

Anonymous
10/07/24(Mon)05:49:27 No.102715856

Anonymous 10/07/24(Mon)05:49:27 No.102715856

>>102715805
Small devices or speculative decoding for big models. Maybe.

Anonymous
10/07/24(Mon)05:53:22 No.102715878

Anonymous 10/07/24(Mon)05:53:22 No.102715878

>>102715805
Have you ever dreamed of running an LLM on your Samsung Galaxy S2?

Anonymous
10/07/24(Mon)05:53:46 No.102715881

Anonymous 10/07/24(Mon)05:53:46 No.102715881

>>102715805
Like I said before... Nothing burger.

Anonymous
10/07/24(Mon)05:54:47 No.102715885

Anonymous 10/07/24(Mon)05:54:47 No.102715885

>>102715805
1B is be good for speculative decoding and RAG

Anonymous
10/07/24(Mon)05:55:16 No.102715887

Anonymous 10/07/24(Mon)05:55:16 No.102715887

>>102715805
Bad news for you anon...
All local models are memes...

Anonymous
10/07/24(Mon)06:10:42 No.102715995

Anonymous 10/07/24(Mon)06:10:42 No.102715995

so... 2 more weeks?

Anonymous
10/07/24(Mon)06:11:53 No.102716002

Anonymous 10/07/24(Mon)06:11:53 No.102716002

>>102715805
>Is it just a meme?
Small models can be useful in zoo approaches, but even then you still require massive memory for the zoo.

The way forward for local models isn't small, it's designs actually optimized for local. A LLM transformer should only use a couple percent of parameters for each feature vector and it should predict that subset in the previous layer. Sure it will perform a little worse than a dense transformer, but it can actually run local.

At the moment everyone is sucking NVIDIA cock and refusing to make models optimized for local, but someone will break rank.

Anonymous
10/07/24(Mon)06:20:18 No.102716066

Anonymous 10/07/24(Mon)06:20:18 No.102716066

>>102713107
>>102712615
I still have no idea what the problem was, but I switched to using llamacpp_HF and rolled Oobabooga back to commit ac30b00 as there's apparently some bug in newer commits (https://github.com/oobabooga/text-generation-webui/issues/6431). One of those seems to have fixed my issue of it hanging at the very end of generation, as well as an unnoticed issue of swiping resulting in the same output.

Anonymous
10/07/24(Mon)06:21:31 No.102716073

Anonymous 10/07/24(Mon)06:21:31 No.102716073

smedrins per miku (SPM)

Anonymous
10/07/24(Mon)06:21:54 No.102716075

Anonymous 10/07/24(Mon)06:21:54 No.102716075

>>102715995
2 more years.

Anonymous
10/07/24(Mon)06:31:19 No.102716145

Anonymous 10/07/24(Mon)06:31:19 No.102716145

Could something actually big happen please? I want to have fun playing with new and better kinds of toys than the ones I have had for the past years. If nothing happens today, I will post a Miku out of boredom and exhaustion. Again.

Anonymous
10/07/24(Mon)06:38:33 No.102716202

Anonymous 10/07/24(Mon)06:38:33 No.102716202

LLM to help me with some math learning? RTX 4080 + 32GB RAM

Anonymous
10/07/24(Mon)06:39:22 No.102716212

Anonymous 10/07/24(Mon)06:39:22 No.102716212

anons, penis status??

Anonymous
10/07/24(Mon)06:41:37 No.102716227

Anonymous 10/07/24(Mon)06:41:37 No.102716227

>>102716212
Broken

Anonymous
10/07/24(Mon)06:55:44 No.102716335

Anonymous 10/07/24(Mon)06:55:44 No.102716335

>>102716202
Even state of the art cloud LLMs cannot do math and struggle with complex logic, it's not gonna happen, you will just get incorrect answers that sound plausibly correct.

Anonymous
10/07/24(Mon)06:58:11 No.102716352

Anonymous 10/07/24(Mon)06:58:11 No.102716352

>>102716335
stop being so mean anon! im sure the latest oai model is capable of doing highschool tier math, and then not be able to explain how to do it properly

Anonymous
10/07/24(Mon)07:02:33 No.102716383

Anonymous 10/07/24(Mon)07:02:33 No.102716383

>>102716212
After a full weekend with an LLM? You could imagine.

Anonymous
10/07/24(Mon)07:09:59 No.102716440

Anonymous 10/07/24(Mon)07:09:59 No.102716440

>>102716335
Huh, AI will soon take my job, which I’m still studying for, but it won’t even help me with math???
Ffs, this isn’t fair

Anonymous
10/07/24(Mon)07:12:05 No.102716456

Anonymous 10/07/24(Mon)07:12:05 No.102716456

>>102716440
you should change courses there bud

Anonymous
10/07/24(Mon)07:18:26 No.102716498

Anonymous 10/07/24(Mon)07:18:26 No.102716498

>>102716212
Banana: big, yellow, soft, and you can peel the skin off.

Anonymous
10/07/24(Mon)07:24:22 No.102716534

Anonymous 10/07/24(Mon)07:24:22 No.102716534

File: ED.jpg (435 KB, 2125x1411)

435 KB JPG

>>102716212
LLM induced ED. LLM's made me sapiosexual and made sapiosexuality not a meme.

Anonymous
10/07/24(Mon)07:37:33 No.102716619

Anonymous 10/07/24(Mon)07:37:33 No.102716619

When do (You) think Google will release Gemma 3?

Gemma 1 was released at the end of February
Gemma 2 was released at the end of June
Gemma 3 ... end of October? November?

Anonymous
10/07/24(Mon)07:39:56 No.102716636

Anonymous 10/07/24(Mon)07:39:56 No.102716636

>>102716619
I can't wait to see if Noam Shazeer, who is now back at Google, will give some input to the model and finetune design. If we're lucky, we might end up having some sort of CAI@home with Gemma 3.

Anonymous
10/07/24(Mon)07:40:43 No.102716642

Anonymous 10/07/24(Mon)07:40:43 No.102716642

>>102716456
Too late, I guess I’m DOA

Anonymous
10/07/24(Mon)07:44:31 No.102716667

Anonymous 10/07/24(Mon)07:44:31 No.102716667

>>102716440
AI can automate all 90iq tasks so if you're only capable of doing just those, yes you're fucked.

Anonymous
10/07/24(Mon)08:04:45 No.102716808

Anonymous 10/07/24(Mon)08:04:45 No.102716808

>>102716619
We will probably get a huge wave of models after burger elections. And none of them will move the cooming meta of course because LLM cooming is dead.

Anonymous
10/07/24(Mon)08:12:17 No.102716871

Anonymous 10/07/24(Mon)08:12:17 No.102716871

>>102712681
>Claude is legit and 3.5 stomped ChatGPT for a long time.
true, everytime I make a code project, only C3.5 seems to really get what I want from it, OpenAI really needs to get their shit together, I still believe their vanilla gpt4 (march 2023) is still smarter than what they currently have

Anonymous
10/07/24(Mon)08:28:38 No.102716999

Anonymous 10/07/24(Mon)08:28:38 No.102716999

>>102715200
I've had Nemo models remember stuff from the very beginning of 64K context when asked. Pure delusişon lol

Anonymous
10/07/24(Mon)08:31:44 No.102717027

Anonymous 10/07/24(Mon)08:31:44 No.102717027

>>102712892
CommandR.
Miqu at a low bpw.
Qwen 2.5 for coding.

Anonymous
10/07/24(Mon)08:32:32 No.102717034

Anonymous 10/07/24(Mon)08:32:32 No.102717034

>>102717027
>Qwen 2.5 for coding.
bullshit

Anonymous
10/07/24(Mon)08:37:14 No.102717086

Anonymous 10/07/24(Mon)08:37:14 No.102717086

There's no point in running models at a low bpw. Imagine waiting 3 fucking minutes for a response. Jesus Christ

Anonymous
10/07/24(Mon)08:52:50 No.102717241

Anonymous 10/07/24(Mon)08:52:50 No.102717241

File: Screenshot from 2024-10-0(...).png (60 KB, 1140x326)

60 KB PNG

>>102717034
Dilate more sama

Anonymous
10/07/24(Mon)08:53:15 No.102717243

Anonymous 10/07/24(Mon)08:53:15 No.102717243

>>102717241
cope, wang.

Anonymous
10/07/24(Mon)08:54:31 No.102717261

Anonymous 10/07/24(Mon)08:54:31 No.102717261

>>102717243
>bullshit
>gets proof
>c-c-cope
lol

Anonymous
10/07/24(Mon)08:55:56 No.102717271

Anonymous 10/07/24(Mon)08:55:56 No.102717271

>>102717241
Suddenly just like that mememarks are 100% solid proof!

Anonymous
10/07/24(Mon)08:58:10 No.102717293

Anonymous 10/07/24(Mon)08:58:10 No.102717293

>>102717271
I agree that mememarks aren't exactly reliable but you're free to provide any alternative piece of evidence that would point to the contrary

Anonymous
10/07/24(Mon)08:58:11 No.102717294

Anonymous 10/07/24(Mon)08:58:11 No.102717294

File: file.png (540 KB, 816x1353)

540 KB PNG

>o1 preview and mini arent even in the top 20 on open router weekly rankings
if its so good, why is nobody using it?

Anonymous
10/07/24(Mon)08:59:11 No.102717306

Anonymous 10/07/24(Mon)08:59:11 No.102717306

>>102716212
Floner.

Anonymous
10/07/24(Mon)09:00:25 No.102717316

Anonymous 10/07/24(Mon)09:00:25 No.102717316

File: Screenshot from 2024-10-0(...).png (43 KB, 1052x289)

43 KB PNG

WTF drummer actually bought an ad

Anonymous
10/07/24(Mon)09:00:26 No.102717317

Anonymous 10/07/24(Mon)09:00:26 No.102717317

>>102717034
Bullshit how?
It's the best local model for coding at its size as far as I'm concerned.

Anonymous
10/07/24(Mon)09:00:29 No.102717319

Anonymous 10/07/24(Mon)09:00:29 No.102717319

>>102717241
anon there's a reason I still use chatgpt's legacy 4 model despite the benchmark of their other meme stuff being higher.
a model doing well on a benchmark just means it is good at passing benchmarks, it doesn't actually mean it is useful .

Anonymous
10/07/24(Mon)09:00:44 No.102717323

Anonymous 10/07/24(Mon)09:00:44 No.102717323

>>102717294
>imagine paying for invisible tokens
it's too expensive (and I didn't find it good desu)

Anonymous
10/07/24(Mon)09:00:52 No.102717324

Anonymous 10/07/24(Mon)09:00:52 No.102717324

>>102717243
That ching chong is a paid shill, just ignore (them)

Anonymous
10/07/24(Mon)09:00:56 No.102717326

Anonymous 10/07/24(Mon)09:00:56 No.102717326

>>102717316
LMAO

Anonymous
10/07/24(Mon)09:02:55 No.102717347

Anonymous 10/07/24(Mon)09:02:55 No.102717347

>>102717294
It's both 10x as expensive as other models per output token and each token has 10 or more "support tokens" so really it's 100x more expensive.

Anonymous
10/07/24(Mon)09:05:26 No.102717376

Anonymous 10/07/24(Mon)09:05:26 No.102717376

>>102717323
>COT prompt gets leaked
>10 columns of repeating "nigger"s to artifically inflate price

sama i'd kneel

Anonymous
10/07/24(Mon)09:08:08 No.102717401

Anonymous 10/07/24(Mon)09:08:08 No.102717401

>>102717027
NTA, but how much RAM/VRAM Qwen requires to run?

Anonymous
10/07/24(Mon)09:08:45 No.102717408

Anonymous 10/07/24(Mon)09:08:45 No.102717408

File: file.png (97 KB, 498x498)

97 KB PNG

>>102717376
>saying nigger repetedly makes the model smarter
as it should

Anonymous
10/07/24(Mon)09:09:38 No.102717412

Anonymous 10/07/24(Mon)09:09:38 No.102717412

>>102717316
This is the happening we were waiting for!

Anonymous
10/07/24(Mon)09:14:58 No.102717455

Anonymous 10/07/24(Mon)09:14:58 No.102717455

>>102717316
Buy an ad schizo in shambles

Anonymous
10/07/24(Mon)09:16:47 No.102717476

Anonymous 10/07/24(Mon)09:16:47 No.102717476

>>102717455
Did it work? Are you a woman now?

Anonymous
10/07/24(Mon)09:20:46 No.102717512

Anonymous 10/07/24(Mon)09:20:46 No.102717512

>>102717476
>schizobabble

Anonymous
10/07/24(Mon)09:22:35 No.102717528

Anonymous 10/07/24(Mon)09:22:35 No.102717528

>>102717512
>Everything I don't like is schizo
seems like some real projection to me.

Anonymous
10/07/24(Mon)09:24:17 No.102717546

Anonymous 10/07/24(Mon)09:24:17 No.102717546

>>102717512
Hit a bit too close to home huh?

Anonymous
10/07/24(Mon)09:53:42 No.102717846

Anonymous 10/07/24(Mon)09:53:42 No.102717846

>>102715099
My hardware is just a 3060 with a mid-range Intel CPU. The model I'm using is Mistral-7B-Instruct-v0.3 Q4_0. After restarting my computer, I noticed that it's not as bad as before, but using parallel 2 is still slower than using parallel 1, and anything higher than 2 is even slower.

>"llama-server --model "D:\Downloads\mistral-7b-instruct-v0.3.Q4_0.gguf" -c 35000 -ngl 999 --threads 5 --no-mmap --parallel 5"
>batch size 1 (Parallel 1)
>1/100 [00:08<13:20, 8.09s/it]
>batch size 2 (Parallel 2)
>1/50 [00:16<13:47, 16.88s/it]
>batch size 3 (Parallel 3)
>1/34 [00:23<12:41, 23.09s/it]
>batch size 4 (Parallel 4)
>1/25 [00:40<16:07, 40.33s/it]

For comparison, here’s the speed I get using exllamav2 with dynamic batching (model: LLaMA 3.1 8B 8BPW):

> model_name: 3.1-8b-8bpw, max_seq_len: 20000
>batch size 1
>1/100 [00:10<18:08, 11.00s/it]
>batch size 2
>1/50 [00:09<07:50, 9.61s/it]
>batch size 3
>1/34 [00:15<08:18, 15.09s/it]
>batch size 4
>1/25 [00:17<06:52, 17.18s/it]

Do you think I should open an issue on github or just accept that llama.cpp is garbage for parallel prompt generation on my hardware?

Anonymous
10/07/24(Mon)10:02:33 No.102717924

Anonymous 10/07/24(Mon)10:02:33 No.102717924

>>102709621
Common Crawl
https://commoncrawl.org/

llama.cpp CUDA dev !!OM2Fp6Fn93S
10/07/24(Mon)10:03:43 No.102717935

llama.cpp CUDA dev !!OM2Fp6Fn93S 10/07/24(Mon)10:03:43 No.102717935

>>102717846
How much of that 35000 context are you actually using?
The context size is shared equally between all slots so if you have to do reprocessing that may lead to bad performance.

Or if a reboot fixed the issue make sure the NVIDIA driver option for automatic VRAM<->RAM swapping is disabled.

Anonymous
10/07/24(Mon)10:12:41 No.102718020

Anonymous 10/07/24(Mon)10:12:41 No.102718020

>>102717027
>CommandR.
Commander was the most toxic model out of all. It made me believe that perfect cooming is just behind the corner. And then 2024 happened...

Anonymous
10/07/24(Mon)10:14:24 No.102718035

Anonymous 10/07/24(Mon)10:14:24 No.102718035

>>102717935
>How much of that 35000 context are you actually using?
>The context size is shared equally between all slots so if you have to do reprocessing that may lead to bad performance.
I noticed. I think it's using most of the 35000 context, because each prompt has around 8k tokens and when I use parallel 5 the console spams "context shift" so I guess that's when it starts reprocessing.

>Or if a reboot fixed the issue make sure the NVIDIA driver option for automatic VRAM<->RAM swapping is disabled.
The swapping is already disabled, if I try to use 40000 context I get OOM.

Anonymous
10/07/24(Mon)10:15:23 No.102718050

Anonymous 10/07/24(Mon)10:15:23 No.102718050

>>102718020
I still weep. Cohere refresh I think was the point I kinda gave up on local models. Just been checking the thread out of habit now.

Anonymous
10/07/24(Mon)10:17:33 No.102718081

Anonymous 10/07/24(Mon)10:17:33 No.102718081

is CommandR actually a good model for cooming with 24GB VRAM? Only models I've had some success with were Bagel-MisteryTour and Mistral Nemo (i can run Largestral on RAM but it's 0.5 t/s so I don't even bother)

llama.cpp CUDA dev !!OM2Fp6Fn93S
10/07/24(Mon)10:24:23 No.102718155

llama.cpp CUDA dev !!OM2Fp6Fn93S 10/07/24(Mon)10:24:23 No.102718155

>>102718035
So is the performance fixed if you set --parallel 2 ?

Anonymous
10/07/24(Mon)10:29:24 No.102718203

Anonymous 10/07/24(Mon)10:29:24 No.102718203

>>102718155
No, it's not 10x slower but it isn't any faster as you would expect from batching, as you can see here: >>102717846 (in case you didn't notice before)

llama.cpp CUDA dev !!OM2Fp6Fn93S
10/07/24(Mon)10:32:09 No.102718229

llama.cpp CUDA dev !!OM2Fp6Fn93S 10/07/24(Mon)10:32:09 No.102718229

>>102718203
Okay, then I guess I misread your post.
In the command you posted you set --parallel 5 so I was assuming you were using that without modification even if you are only utilizing 2/5 slots.

Anonymous
10/07/24(Mon)10:33:02 No.102718240

Anonymous 10/07/24(Mon)10:33:02 No.102718240

>>102711599
Buy a RTX 3060, they are dirt cheap and they have 12gb of VRAM, enough for Mistral Small 22B
But right now you can run Mistral Nemo 12B at 5bpw offloading a few GBs to RAM

Anonymous
10/07/24(Mon)10:38:08 No.102718295

Anonymous 10/07/24(Mon)10:38:08 No.102718295

>>102718229
Oh, sorry! I didn't realize that I copied the command without removing that part. But yes, for each run I adjusted the parallel parameter to match the batch size. To be clear, the batch size is the number of requests sent to be processed each iteration.

Anonymous
10/07/24(Mon)11:07:30 No.102718634

Anonymous 10/07/24(Mon)11:07:30 No.102718634

haven't looked at this stuff in at least a year, have models gotten efficient enough yet that a 1080 w/ 8gb VRAM can run something actually decent?

Anonymous
10/07/24(Mon)11:09:25 No.102718655

Anonymous 10/07/24(Mon)11:09:25 No.102718655

What was the name of the model that's chameleon with the image capabilities restored?
Alternatively, are there any models that can receive and output image and text?

Anonymous
10/07/24(Mon)11:11:12 No.102718670

Anonymous 10/07/24(Mon)11:11:12 No.102718670

>>102718655
Anole
>https://github.com/GAIR-NLP/anole

Anonymous
10/07/24(Mon)11:12:49 No.102718690

Anonymous 10/07/24(Mon)11:12:49 No.102718690

>>102718634
If you don't have super high expectations, 8b and 12b models are alright. Just quant them a bit.

Anonymous
10/07/24(Mon)11:13:06 No.102718694

Anonymous 10/07/24(Mon)11:13:06 No.102718694

>>102718670
That's it.
Thank you.

Anonymous
10/07/24(Mon)11:15:13 No.102718713

Anonymous 10/07/24(Mon)11:15:13 No.102718713

>>102718634
Nope, all 8 / 12 Bs are schizo-filter coal.

Anonymous
10/07/24(Mon)11:21:54 No.102718794

Anonymous 10/07/24(Mon)11:21:54 No.102718794

lolz
https://www.tomshardware.com/tech-industry/jensen-huang-is-now-worth-more-than-intel-personal-net-worth-currently-valued-at-usd109b-vs-intels-usd96b-market-cap

Anonymous
10/07/24(Mon)11:33:51 No.102718935

Anonymous 10/07/24(Mon)11:33:51 No.102718935

>Addition is All You Need for Energy-efficient Language Models. "This will deliver high-speed and energy-efficient AI hosting solutions, reducing the energy cost for data centers, robotics, and a wide spectrum of edge-computing devices."
https://arxiv.org/abs/2410.00907

Anonymous
10/07/24(Mon)11:35:16 No.102718956

Anonymous 10/07/24(Mon)11:35:16 No.102718956

File: 1725687188415468.png (115 KB, 579x564)

115 KB PNG

>>102718935
not a "literally who" paper btw

Anonymous
10/07/24(Mon)11:36:54 No.102718980

Anonymous 10/07/24(Mon)11:36:54 No.102718980

Are people trying to come up with better attention mechanisms, as in mechanisms that make the model better at using information from the context for a given prompt, or are current efforts focused on making attention less resource intensive?

Anonymous
10/07/24(Mon)11:36:57 No.102718984

Anonymous 10/07/24(Mon)11:36:57 No.102718984

>>102718935
Bitnet bros?

Anonymous
10/07/24(Mon)11:38:11 No.102718997

Anonymous 10/07/24(Mon)11:38:11 No.102718997

File: 1726105725082650.png (20 KB, 569x158)

20 KB PNG

>>102718935
>hardware bound
retreat now, this is a level-5 nothingburger.

Anonymous
10/07/24(Mon)11:38:12 No.102718998

Anonymous 10/07/24(Mon)11:38:12 No.102718998

>>102718935
>yet another *is all you need* paper
yawn

Anonymous
10/07/24(Mon)11:40:02 No.102719016

Anonymous 10/07/24(Mon)11:40:02 No.102719016

File: 1699439191424166.png (28 KB, 523x215)

28 KB PNG

>>102718935

Anonymous
10/07/24(Mon)11:40:03 No.102719017

Anonymous 10/07/24(Mon)11:40:03 No.102719017

>>102718794
He really got ridiculous stock awards. More than Elon and Elon got crucified for it ...

Anonymous
10/07/24(Mon)11:49:14 No.102719106

Anonymous 10/07/24(Mon)11:49:14 No.102719106

What's good model with 8K context or more for RPs? That isn't 70B or something

Anonymous
10/07/24(Mon)11:49:59 No.102719116

Anonymous 10/07/24(Mon)11:49:59 No.102719116

File: gotagoataboutthesamespeed.png (49 KB, 674x133)

49 KB PNG

>>102719016
Accuracy is good, of course, but i want fast. We already have many quantization schemes that make running models faster just because of the reduced memory bandwidth to move the weights around. I'd like to see faster algorithms instead.

Anonymous
10/07/24(Mon)11:52:11 No.102719143

Anonymous 10/07/24(Mon)11:52:11 No.102719143

>>102719106
Mistral Nemo or one of its many finetunes. Haven't tried Mistral Small, but it's another option.

Anonymous
10/07/24(Mon)11:52:32 No.102719150

Anonymous 10/07/24(Mon)11:52:32 No.102719150

>>102719106
MythoMax

Anonymous
10/07/24(Mon)11:52:37 No.102719152

Anonymous 10/07/24(Mon)11:52:37 No.102719152

File: file.png (23 KB, 830x302)

23 KB PNG

https://github.com/xjdr-alt/entropix
what's the /v/erdict on this?

Anonymous
10/07/24(Mon)11:54:23 No.102719175

Anonymous 10/07/24(Mon)11:54:23 No.102719175

>>102719152
looks interesting but you need HF llama access to try it

Anonymous
10/07/24(Mon)11:57:04 No.102719195

Anonymous 10/07/24(Mon)11:57:04 No.102719195

File: file.png (90 KB, 617x846)

90 KB PNG

>>102719175
>>102719152
dunno who this grifters are but unironically big if true

Anonymous
10/07/24(Mon)11:57:36 No.102719199

Anonymous 10/07/24(Mon)11:57:36 No.102719199

>>102719143
Which ones? There is few MythoMax and Mistral Nemo

Anonymous
10/07/24(Mon)11:58:40 No.102719216

Anonymous 10/07/24(Mon)11:58:40 No.102719216

File: file.png (41 KB, 592x328)

41 KB PNG

>>102719195
sama in shambles

Anonymous
10/07/24(Mon)11:59:46 No.102719226

Anonymous 10/07/24(Mon)11:59:46 No.102719226

>>102719199
There's one mythomax.
For mistral nemo, try rocinante. I liked that one. You're gonna need to find one that suits your needs/tastes. Finding the one is entirely on you.

Anonymous
10/07/24(Mon)12:01:10 No.102719251

Anonymous 10/07/24(Mon)12:01:10 No.102719251

>>102719216
i imagine this is how they can let o1 spend an arbitrary amount of time on CoT

Anonymous
10/07/24(Mon)12:02:00 No.102719258

Anonymous 10/07/24(Mon)12:02:00 No.102719258

File: 1712146615188249.png (37 KB, 624x338)

37 KB PNG

>>102719152
>smart sampler is all you need

Anonymous
10/07/24(Mon)12:02:00 No.102719259

Anonymous 10/07/24(Mon)12:02:00 No.102719259

>>102718997
Meh, 50% savings over fp8 and needs new hardware. Hadamard domain 4 bit multiplication can work with existing hardware.

Anonymous
10/07/24(Mon)12:02:21 No.102719264

Anonymous 10/07/24(Mon)12:02:21 No.102719264

File: hmmmm.png (159 KB, 830x702)

159 KB PNG

>>102719152
Sounds like a wanker.
And having the need to implement the sampler individually for different models is a nightmare. It will end up being its own little program, never implemented on any other backend, or abandoned after they implemented it for 2-3 models and somebody buys their shit.

Anonymous
10/07/24(Mon)12:03:07 No.102719276

Anonymous 10/07/24(Mon)12:03:07 No.102719276

>>102719264
>Sounds like a wanker.
yeah, clearly a mentally ill troon. nothingburger, move along people

Anonymous
10/07/24(Mon)12:03:27 No.102719279

Anonymous 10/07/24(Mon)12:03:27 No.102719279

Hey, Dummer, should I ignore
Rocinante-12B-v2a?
It's under BeaverAI, so it's experimental right?

Anonymous
10/07/24(Mon)12:06:19 No.102719314

Anonymous 10/07/24(Mon)12:06:19 No.102719314

Are they doing this?
https://arxiv.org/abs/2402.10200 (paper from last February)
Chain-of-Thought Reasoning Without Prompting

> Can LLMs reason effectively without prompting? Our findings reveal that, intriguingly, CoT reasoning paths can be elicited from pre-trained LLMs by simply altering the decoding process. Rather than conventional greedy decoding, we investigate the top-k alternative tokens, uncovering that CoT paths are frequently inherent in these sequences. This approach not only bypasses the confounders of prompting but also allows us to assess the LLMs' intrinsic reasoning abilities. Moreover, we observe that the presence of a CoT in the decoding path correlates with a higher confidence in the model's decoded answer. This confidence metric effectively differentiates between CoT and non-CoT paths. Extensive empirical studies on various reasoning benchmarks show that the proposed CoT-decoding effectively elicits reasoning capabilities from language models, which were previously obscured by standard greedy decoding.

Anonymous
10/07/24(Mon)12:11:25 No.102719381

Anonymous 10/07/24(Mon)12:11:25 No.102719381

>>102719314
pff..... no, man. it's like...mist and stuff. you know... in the morning.. with like uhmmmm... entropy, man... it's like hmmm... yeah...

Anonymous
10/07/24(Mon)12:11:40 No.102719384

Anonymous 10/07/24(Mon)12:11:40 No.102719384

>>102719152
Idk, looks like somethingburger https://github.com/xjdr-alt/entropix/blob/main/ui/TODO.md

Anonymous
10/07/24(Mon)12:12:54 No.102719392

Anonymous 10/07/24(Mon)12:12:54 No.102719392

>>102719384
so can a chud from /lmg/ test this? how long until there's a working demo?

Anonymous
10/07/24(Mon)12:14:38 No.102719403

Anonymous 10/07/24(Mon)12:14:38 No.102719403

>>102719226
Can they both handle 8k context?

Anonymous
10/07/24(Mon)12:15:51 No.102719414

Anonymous 10/07/24(Mon)12:15:51 No.102719414

>>102719384
this is an old somethingburger but it got memoryholed pretty fast for some reason

Anonymous
10/07/24(Mon)12:16:27 No.102719421

Anonymous 10/07/24(Mon)12:16:27 No.102719421

File: 1698873194558279.png (63 KB, 756x644)

63 KB PNG

>>102719414
still looks interesting, somehow

Anonymous
10/07/24(Mon)12:17:40 No.102719430

Anonymous 10/07/24(Mon)12:17:40 No.102719430

>>102719403
Not mythomax. That's a really old model. llama2 era from last year.
Nemo (released just a few months ago) claims like 1M context in its config.json but, as usual, it handles much less. I've seen anons reporting it working ok for 12-16k. If you're gonna try any nemo, set the context length manually, otherwise it will OOM. It should work just fine for 8k.

Anonymous
10/07/24(Mon)12:20:15 No.102719446

Anonymous 10/07/24(Mon)12:20:15 No.102719446

>>102719421
gemmasutra mini 2b with that... imagine...

Anonymous
10/07/24(Mon)12:21:05 No.102719452

Anonymous 10/07/24(Mon)12:21:05 No.102719452

>>102719421
A specific closed source sampler, i think this is the reason why CAI's model was so good with long context and understanding before it got filtered to hell.

Anonymous
10/07/24(Mon)12:22:07 No.102719464

Anonymous 10/07/24(Mon)12:22:07 No.102719464

>>102719421
>Wow, this is crazy, you guys.
>Here's a very succinct summary of all the benefits of this new, ground breaking method for [thing].
According to their README, it nseems it needs to be implemented on a model by model basis. It's never going to be implemented in any backend and it'll be their walled garden until they abandon it after implementing a few other models.

Anonymous
10/07/24(Mon)12:23:49 No.102719482

Anonymous 10/07/24(Mon)12:23:49 No.102719482

>>102719446
<thinking>"i gonna let anon suck my cock WAIT... i don't have a cock because i'm a women."</thinking>
>"hey anon i got this fake penis, now you better start sucking on it!"

Anonymous
10/07/24(Mon)12:26:02 No.102719502

Anonymous 10/07/24(Mon)12:26:02 No.102719502

File: d04f9cbd1e2f5130859ca5efa(...).jpg (45 KB, 640x640)

45 KB JPG

Re: model ablation with Gwen2.5-32B

Models trained on ChatGPT inherit its base prompt in their matrix, and will always produce GPT-like outputs without accompanying GPT-like prompt. The main culprit is refusal to answer certain prompts, i.e. OpenAIs disclaimer of responsibility degenerating into aggressive uncontrollable thought policing behaviors. Counter-prompting has little to no effect, and even changing the beginning of response from negative to affirmative doesn't help much.

Models encode mental concepts rather than individual words, which is why they're so great at paraphrasing and translation (provided appropriate corpus of target language data). This includes a mental concept of "refusing the prompt". It's possible to isolate and negate this vector, making the model unable to follow through with it, since it simply doesn't encode anything.

Pros:
>It can't refuse prompts. It will follow in whichever direction it's prompted.
Cons:
>It can't refuse prompts. It will follow in whichever direction it's prompted.

The main consequence is that rather than simply doing away with moralizing, it becomes a yes-man, which is normally not useful. A more subtle effect is that, since it can't encode the meaning of "no", it will not listen when you say "cut it out".

Anonymous
10/07/24(Mon)12:28:00 No.102719523

Anonymous 10/07/24(Mon)12:28:00 No.102719523

>>102719502
Faster solution that's also permanent: nuke ScaleAI's HQ

Anonymous
10/07/24(Mon)12:28:23 No.102719525

Anonymous 10/07/24(Mon)12:28:23 No.102719525

All this talk of letting the model think further before outputting a final response reminded me of a theory that whatever the model is doing in the hidden layers is akin to an "internal voice" reasoning about the input, which is why more layers = better, something like that.
Has anybody experimented with simply looping the hidden layers of a model at runtime?
I guess it would be kind of like an impromptu frankenmerge, which as far as I can tell doesn't really generate results much better than the base model.
That also makes me wonder what the output of a model trained with looped layers would look like.
Maybe I'll try fucking around with that on a llama.cpp fork.

Anonymous
10/07/24(Mon)12:28:29 No.102719527

Anonymous 10/07/24(Mon)12:28:29 No.102719527

>>102719452
CAI definitely still has something that prevents paragraph-level repetition over long conversations; local models don't have anything for consistently avoiding that.

Anonymous
10/07/24(Mon)12:40:24 No.102719656

Anonymous 10/07/24(Mon)12:40:24 No.102719656

>>102719525
There was a PR on llama.cpp some time ago related to that. It was either a model merge example, where you could just duplicate layers from a source model into itself. I think they even tried to do it dynamically on load, without having to write the model to disk for quick testing.
I'm pretty sure it was this attempt:
>https://github.com/ggerganov/llama.cpp/pull/5741
There's been a lot of refactoring since then, but it may give you a bit of a head start if you wanna try.

Anonymous
10/07/24(Mon)12:41:00 No.102719671

Anonymous 10/07/24(Mon)12:41:00 No.102719671

>>102719464
Maybe Meta/Microsoft/Google/etc will stop releasing barely tweaked GPT2's and use it themselves in open models?

Anonymous
10/07/24(Mon)12:42:08 No.102719685

Anonymous 10/07/24(Mon)12:42:08 No.102719685

>>102719525
It's kind of a known phenomenon, that having a deep MLP model is the same as having a recurrent model, while being way more robust in every respect. But there's a certain point at which it stops improving, as each layer is basically a tier of abstraction. Modern LLMs already use stupid amount of layers, routinely having well over 50 tiers of abstraction, so I don't think this is the limiting factor. And it's why CoT doesn't makes models' output better, it just makes them less shit, since effectively it transforms a zero-shot problem into a few-shot problem which is much easier to correctly answer.

Anonymous
10/07/24(Mon)12:45:15 No.102719712

Anonymous 10/07/24(Mon)12:45:15 No.102719712

>>102719671
As far as i understood, this is an inference-time thing. A looping sampler. They don't modify the models or do any extra training. They just use stock models. But i could be wrong. If you have a link or source that says otherwise, i'd like to have it.

Anonymous
10/07/24(Mon)12:47:09 No.102719737

Anonymous 10/07/24(Mon)12:47:09 No.102719737

>>102719712
I don't know why i added "looping" there. Maybe i have a very specific type of hand spasms or something. looping

Anonymous
10/07/24(Mon)12:48:11 No.102719745

Anonymous 10/07/24(Mon)12:48:11 No.102719745

>>102719482
sovl

Anonymous
10/07/24(Mon)12:49:00 No.102719752

Anonymous 10/07/24(Mon)12:49:00 No.102719752

>>102719656
Oh fuck yeah, that's awesome.
Thank you so much anon, that'll be a great help.

>>102719685
>Modern LLMs already use stupid amount of layers, routinely having well over 50 tiers of abstraction, so I don't think this is the limiting factor.
Probably, yeah.
Still, would be fun to test a looped llama3 8b vs a stock llama 3 70b, for example.
I wonder if there's some kind of normalization one could to to the logits between loops to make the process more meaningful.

>>102719737
lmao
Is that the new ".assistant"?

Anonymous
10/07/24(Mon)12:50:08 No.102719766

Anonymous 10/07/24(Mon)12:50:08 No.102719766

>>102719525
>Has anybody experimented with simply looping the hidden layers of a model at runtime?
Looped Transformers are Better at Learning Learning Algorithms
https://arxiv.org/html/2311.12424v2

Found with human prompting of google search.

Anonymous
10/07/24(Mon)12:50:49 No.102719773

Anonymous 10/07/24(Mon)12:50:49 No.102719773

>>102719482
If that sampler shit really works, it will solve so many problems holy sheet

Anonymous
10/07/24(Mon)12:51:09 No.102719777

Anonymous 10/07/24(Mon)12:51:09 No.102719777

>>102719766
And /lmg/ delivers.
Thanks mang.

Anonymous
10/07/24(Mon)12:53:52 No.102719814

Anonymous 10/07/24(Mon)12:53:52 No.102719814

>>102715153
The ruler benchmark shows nemo has around 50% recall at 64k.

Anonymous
10/07/24(Mon)12:56:02 No.102719846

Anonymous 10/07/24(Mon)12:56:02 No.102719846

Guys what if Reflection worked too? That would be amazing.

Anonymous
10/07/24(Mon)12:56:59 No.102719863

Anonymous 10/07/24(Mon)12:56:59 No.102719863

>>102719846
b-b-b-bitnet?

Anonymous
10/07/24(Mon)12:59:15 No.102719891

Anonymous 10/07/24(Mon)12:59:15 No.102719891

>>102719773
>>102719464

Anonymous
10/07/24(Mon)13:07:47 No.102719983

Anonymous 10/07/24(Mon)13:07:47 No.102719983

>>102719685
>as each layer is basically a tier of abstraction.
this sounds like an oversimplification. it's certainly true for image classification CNNs, but that doesn't mean it's true for every ANN

Anonymous
10/07/24(Mon)13:11:22 No.102720037

Anonymous 10/07/24(Mon)13:11:22 No.102720037

File: 1701838570126153.png (134 KB, 1123x450)

134 KB PNG

https://huggingface.co/Zyphra/Zamba2-2.7B-instruct
https://huggingface.co/Zyphra/Zamba2-1.2B-instruct

Anonymous
10/07/24(Mon)13:13:55 No.102720087

Anonymous 10/07/24(Mon)13:13:55 No.102720087

>>102720037
>Zamba2-2.7B-Instruct is a hybrid model composed of state-space (Mamba2) and transformer blocks.
Is this jamba 2.0?

Anonymous
10/07/24(Mon)13:19:47 No.102720174

Anonymous 10/07/24(Mon)13:19:47 No.102720174

>>102720037
zamn, she's 1.2B?

Anonymous
10/07/24(Mon)13:21:54 No.102720197

Anonymous 10/07/24(Mon)13:21:54 No.102720197

>>102720037
OOHHH I'M BENCHMAXXIIIINGGGGG

Anonymous
10/07/24(Mon)13:22:51 No.102720214

Anonymous 10/07/24(Mon)13:22:51 No.102720214

>>102719983
Any sequence of linear transformations of arbitrary length can be computed in a single operation. So unless each layer is doing something unique, it's extraneous and adds only to megabytes and megafops required to run and train it. But actually, in terms of SGD it's hard to even train a model such that each layer DOESN'T represent a next tier of abstraction.

Anonymous
10/07/24(Mon)13:25:04 No.102720247

Anonymous 10/07/24(Mon)13:25:04 No.102720247

>>102719152
kalomaze settled this down already, pack it up. https://x.com/kalomaze/status/1843329787479421010

Anonymous
10/07/24(Mon)13:30:36 No.102720318

Anonymous 10/07/24(Mon)13:30:36 No.102720318

>>102720037
I like my LLMs like I like my women, tiny, low memory footprint, frequent halucinations

Anonymous
10/07/24(Mon)13:33:55 No.102720359

Anonymous 10/07/24(Mon)13:33:55 No.102720359

File: lecun.jpg (6 KB, 225x225)

6 KB JPG

>>102720037
Small and open

Anonymous
10/07/24(Mon)13:34:24 No.102720365

Anonymous 10/07/24(Mon)13:34:24 No.102720365

>>102720318
stabilize it with entropy sampler kek

Anonymous
10/07/24(Mon)13:38:41 No.102720403

Anonymous 10/07/24(Mon)13:38:41 No.102720403

>>102720214
>Any sequence of linear transformations of arbitrary length can be computed in a single operation
linear transformations alone can't do much
neural networks work because of the non-linear activations, and while it's true that a feedforward network with a single hidden layer can approximate any function, this doesn't apply to the attention, convolution and recurrent parts of the neural networks we actually use, which are also organized in layers

Anonymous
10/07/24(Mon)13:42:30 No.102720455

Anonymous 10/07/24(Mon)13:42:30 No.102720455

>>102720403
>this doesn't apply to the attention, convolution and recurrent parts of the neural networks we actually use
If you turn the entire context into one giant feature vector for the FFN then the entire transformer is a single function to approximate.

Anonymous
10/07/24(Mon)13:45:49 No.102720507

Anonymous 10/07/24(Mon)13:45:49 No.102720507

>>102720403
Technically, RELU is only half-linear, true. But the point stands. It's not reasonable to expect layers that duplicate the function of other layers to contribute to quality of the output. And it's natural for a model pressured into reducing loss value to make the best use of available parameters, which means memorizing relationships between relationships, i.e. increasing the level of abstraction.

Anonymous
10/07/24(Mon)13:46:55 No.102720530

Anonymous 10/07/24(Mon)13:46:55 No.102720530

>>102720037
At work, can't Nala test. Talk about a botched release.

Anonymous
10/07/24(Mon)13:56:19 No.102720629

Anonymous 10/07/24(Mon)13:56:19 No.102720629

>super mega meme sampler will save the day guys!
Really fits how /lmg/ is dead now.

Anonymous
10/07/24(Mon)13:57:46 No.102720647

Anonymous 10/07/24(Mon)13:57:46 No.102720647

>>102719525
>Has anybody experimented with simply looping the hidden layers of a model at runtime?
frankenmerge newfag. kill yourself. no really please kill yourself.

Anonymous
10/07/24(Mon)14:00:48 No.102720688

Anonymous 10/07/24(Mon)14:00:48 No.102720688

File: file.png (1.16 MB, 1300x648)

1.16 MB PNG

>>102720037
>where is l2 30B? I hope the models don't get any smaller
>where is l3 13B? I hope the models don't get any smaller
>this 2.7B instruct punches above the weight and trades blows with gpt 4!
nigger faggot tranny

Anonymous
10/07/24(Mon)14:03:51 No.102720727

Anonymous 10/07/24(Mon)14:03:51 No.102720727

>>102720629
Yes, people are desperate for some breakthrough, understandable.

Anonymous
10/07/24(Mon)14:24:55 No.102720970

Anonymous 10/07/24(Mon)14:24:55 No.102720970

>>102720688
True

Anonymous
10/07/24(Mon)14:27:03 No.102720992

Anonymous 10/07/24(Mon)14:27:03 No.102720992

>emu3
qrd? Any good for video generation?

Anonymous
10/07/24(Mon)14:36:10 No.102721101

Anonymous 10/07/24(Mon)14:36:10 No.102721101

>i found a way to cut this pizza so that it'll turn into gold bro
>just one more sampler and we'll get agi bro

Anonymous
10/07/24(Mon)14:46:22 No.102721220

Anonymous 10/07/24(Mon)14:46:22 No.102721220

>>102720507
the second layer is certainly not duplicating the functions of the first layer, otherwise we'd all be using transformers with 1 layer
yes, the transformer needs multiple layers to turn raw tokens into abstract concepts, but the fact that the benefit of having multiple layers is solely due to abstraction is an assumption derived from image classification networks that may not be true for transformers
humans reason in step, so why can't transformers do the same after having abstracted enough? especially considering they're limited in what they can do with one step due to attention
older NNs have been shown to perform poorly with many layers for weird reasons that have nothing to do with abstraction, like gradient vanishing, so there may be other things we're missing

Anonymous
10/07/24(Mon)14:46:53 No.102721226

Anonymous 10/07/24(Mon)14:46:53 No.102721226

What is the next breakthrough?

Anonymous
10/07/24(Mon)14:48:05 No.102721242

Anonymous 10/07/24(Mon)14:48:05 No.102721242

>>102721226
Mechanical jeets

Anonymous
10/07/24(Mon)14:49:20 No.102721261

Anonymous 10/07/24(Mon)14:49:20 No.102721261

>>102720727
>breakthrough
It's a nothingburger you dumb troglodyte, for god's sake stop hyping up obvious bullshit.

Anonymous
10/07/24(Mon)14:51:27 No.102721288

Anonymous 10/07/24(Mon)14:51:27 No.102721288

File: file.png (26 KB, 1208x127)

26 KB PNG

>>102721226
The death of ST properly signaling the death of llm roleplay

Anonymous
10/07/24(Mon)15:00:24 No.102721407

Anonymous 10/07/24(Mon)15:00:24 No.102721407

>>102721288
wuts wrong with ST? lupus?

Anonymous
10/07/24(Mon)15:02:10 No.102721435

Anonymous 10/07/24(Mon)15:02:10 No.102721435

Why Vedal was able to develop such a complex AI solo, yet local fails to create something similar?

Anonymous
10/07/24(Mon)15:02:56 No.102721448

Anonymous 10/07/24(Mon)15:02:56 No.102721448

File: file.png (22 KB, 450x169)

22 KB PNG

>>102721407
Cohee's having a meltie, dunno why
>>102721303
>Cohee's gone.
>>102721147
>>102721168
>>102721186
>>102721206
>>102721222
>>102721236
>>102721256

Anonymous
10/07/24(Mon)15:05:05 No.102721480

Anonymous 10/07/24(Mon)15:05:05 No.102721480

>>102721448
consider my spine shivered

Anonymous
10/07/24(Mon)15:08:20 No.102721531

Anonymous 10/07/24(Mon)15:08:20 No.102721531

Just fork it and watch as the fork gets 1000x the users

Anonymous
10/07/24(Mon)15:16:20 No.102721636

Anonymous 10/07/24(Mon)15:16:20 No.102721636

>>102721448
my shine spivered

Anonymous
10/07/24(Mon)15:18:11 No.102721657

Anonymous 10/07/24(Mon)15:18:11 No.102721657

>>102721448
anything about actually stripping functionality outside of defaults?

Anonymous
10/07/24(Mon)15:18:55 No.102721665

Anonymous 10/07/24(Mon)15:18:55 No.102721665

>>102721448
>dunno why
He seems to feel being an e-famous open source contributor should translate into being rich. Reality is apparently not cooperating.

Anonymous
10/07/24(Mon)15:23:51 No.102721719

Anonymous 10/07/24(Mon)15:23:51 No.102721719

Do any of that shit will improve my cooming experience soon or should I come back in a year?

Anonymous
10/07/24(Mon)15:24:51 No.102721733

Anonymous 10/07/24(Mon)15:24:51 No.102721733

>>102721719
Coming back in a year sounds like a good idea. That way all the incremental improvements will make a bigger impact, since you won't experience them incrementally.

Anonymous
10/07/24(Mon)15:31:23 No.102721841

Anonymous 10/07/24(Mon)15:31:23 No.102721841

>>102721733
agi sexbots are coming out tomorrow and if you dont buy them youll miss your only chance for affordable robowife

Anonymous
10/07/24(Mon)15:31:42 No.102721850

Anonymous 10/07/24(Mon)15:31:42 No.102721850

>>102721448
Introducing SmartTavern™, a frontend for Power Users.
Added features:
>ChatGPT-like interface
>Made ChatML the default preset
>Light mode
>Strict default word filter. Don't ever worry about cursing by accident.
Removed features:
>Character cards with images
>Options for custom avatar. It is enough for our users to have choices between the logos of popular providers.
>Default cards
>RP presets

Anonymous
10/07/24(Mon)15:38:08 No.102721934

Anonymous 10/07/24(Mon)15:38:08 No.102721934

>>102721448
This news doesn't matter to me. If push comes to shove, I can just code my own personal frontend. In fact, most of /lmg/ can do so, given we've shared personal projects and prompting scripts for different use cases before. But we don't, simply because of how convenient Silly Tavern is so far. If something were to happen to ST you can bet your ass someone is going to fill in those boots.

Anonymous
10/07/24(Mon)15:41:54 No.102721996

Anonymous 10/07/24(Mon)15:41:54 No.102721996

applel bros...
https://github.com/JosefAlbers/e2tts-mlx

Anonymous
10/07/24(Mon)15:45:40 No.102722057

Anonymous 10/07/24(Mon)15:45:40 No.102722057

I discovered a bug in ooba. Anyone surprised? Check this out:

Enter some prompt (any) and fix the seed (any). Then keep doing gens: 1st, 2nd, 3rd etc.

The 1st gen will be different from any n-th where n>2 while all n-th gens will be identical

Change the prompt a little and repeat. Again, the 1st gen will be different, while all others gonna be identical.

Anonymous
10/07/24(Mon)15:46:52 No.102722080

Anonymous 10/07/24(Mon)15:46:52 No.102722080

>>102722057
Is that ooba or whatever "loader" you are using?

Anonymous
10/07/24(Mon)15:47:16 No.102722086

Anonymous 10/07/24(Mon)15:47:16 No.102722086

>>102722057
You can do it via API (fix python formatting, activate API by --api):

import json
import requests
import sseclient # pip install sseclient-py

url = "http://127.0.0.1:5000/v1/completions"

headers = {
"Content-Type": "application/json"
}

data = {
"prompt": "Continue writing the following story. Whiskers is a cute and lovely little kitten.",
'max_tokens': 512,
'temperature': 1,
'top_p': 1,
'top_k': 0,
'typical_p': 1,
'min_p': 0.05,
'repetition_penalty': 1,
'frequency_penalty': 0,
'presence_penalty': 0,

"seed": 12345,
"stream": True,
}

for i in range(4):
stream_response = requests.post(url, headers=headers, json=data, verify=False, stream=True)
client = sseclient.SSEClient(stream_response)

print(data['prompt'])
print(data['seed'])
for event in client.events():
payload = json.loads(event.data)
print(payload['choices'][0]['text'], end='')

print('\r\n--------------------------------------------------------------')

Anonymous
10/07/24(Mon)15:48:25 No.102722105

Anonymous 10/07/24(Mon)15:48:25 No.102722105

>>102722080
>I discovered a bug in ooba
llama.cpp in ooba

Anonymous
10/07/24(Mon)15:49:42 No.102722125

Anonymous 10/07/24(Mon)15:49:42 No.102722125

>>102722105
I see.
So is the bug in ooba, in the python bindings, or in llama.cpp itself?

Anonymous
10/07/24(Mon)15:50:05 No.102722130

Anonymous 10/07/24(Mon)15:50:05 No.102722130

File: Screenshot_3.jpg (185 KB, 1240x1184)

185 KB JPG

>>102722080

Anonymous
10/07/24(Mon)15:50:38 No.102722139

Anonymous 10/07/24(Mon)15:50:38 No.102722139

>>102722105
Then check llama-server on its own, retardo. There's like 3 independent projects in the middle.

Anonymous
10/07/24(Mon)15:53:36 No.102722186

Anonymous 10/07/24(Mon)15:53:36 No.102722186

>>102722139
nobody cares

Anonymous
10/07/24(Mon)15:55:20 No.102722207

Anonymous 10/07/24(Mon)15:55:20 No.102722207

>>102722130
holy nostalgia

Anonymous
10/07/24(Mon)16:06:44 No.102722363

Anonymous 10/07/24(Mon)16:06:44 No.102722363

ST's anti rp cleanup already started btw
https://github.com/SillyTavern/SillyTavern/commit/4d35fea3b3243a02e333747b9298bada0fdb3aab

Anonymous
10/07/24(Mon)16:08:43 No.102722389

Anonymous 10/07/24(Mon)16:08:43 No.102722389

>>102722363
Good for them.

Anonymous
10/07/24(Mon)16:09:41 No.102722408

Anonymous 10/07/24(Mon)16:09:41 No.102722408

I always liked kobold lite better than ST
rest in piss you overrated bloated piece of shit

Anonymous
10/07/24(Mon)16:13:01 No.102722452

Anonymous 10/07/24(Mon)16:13:01 No.102722452

File: annoyance.png (51 KB, 364x365)

51 KB PNG

>>102722363

Anonymous
10/07/24(Mon)16:20:56 No.102722575

Anonymous 10/07/24(Mon)16:20:56 No.102722575

>>102721448
It looks like he's mad at all the proxyfag locusts who only use ST to plop in their stolen Claude proxies without caring about stuff like samplers or prompt formats. Once again, /aicg/ ruins everything they touch.

Anonymous
10/07/24(Mon)16:24:35 No.102722631

Anonymous 10/07/24(Mon)16:24:35 No.102722631

>>102722575
He's mad because of the association with proxy theft and being labeled the de-facto "ai erp software" isn't making him money.

Anonymous
10/07/24(Mon)16:25:08 No.102722638

Anonymous 10/07/24(Mon)16:25:08 No.102722638

>>102722452
she didn't deserve this

Anonymous
10/07/24(Mon)16:52:55 No.102722948

Anonymous 10/07/24(Mon)16:52:55 No.102722948

File: 11__00843_.png (2 MB, 1024x1024)

2 MB PNG

First they came for the furfags
And I did not speak out
Because I was not a furfag
Then they came for the animetroons
And I did not speak out
Because I was not a animetroon
Then they came for the ERPers
And I did not speak out
Because I do not ahh ahh mistress
Then they came for the roleplayer
And I did not speak out
Because I was not roleplayer
Then they came for me
And there was no one left
To speak out for me

Anonymous
10/07/24(Mon)17:05:24 No.102723106

Anonymous 10/07/24(Mon)17:05:24 No.102723106

>>102722452
RIP, Seraphina

Anonymous
10/07/24(Mon)17:07:47 No.102723139

Anonymous 10/07/24(Mon)17:07:47 No.102723139

>>102722363
AAAAH What's going on??

Anonymous
10/07/24(Mon)17:08:09 No.102723145

Anonymous 10/07/24(Mon)17:08:09 No.102723145

>>102723106
She'll forever live in my personal fork.

Anonymous
10/07/24(Mon)17:09:29 No.102723161

Anonymous 10/07/24(Mon)17:09:29 No.102723161

>>102723139
see
>>102721448
>>102721850

Anonymous
10/07/24(Mon)17:15:52 No.102723242

Anonymous 10/07/24(Mon)17:15:52 No.102723242

>>102723173
>>102723173
>>102723173

Anonymous
10/07/24(Mon)17:32:18 No.102723444

Anonymous 10/07/24(Mon)17:32:18 No.102723444

>>102716636
Not a chance. He'll be turning Gemini into CAI and you'll be thankful when a tiny droplet of that is distilled into Gemma

Anonymous
10/07/24(Mon)17:32:36 No.102723450

Anonymous 10/07/24(Mon)17:32:36 No.102723450

>>102721448
MikuTavern fucking when?

Anonymous
10/07/24(Mon)17:50:04 No.102723717

Anonymous 10/07/24(Mon)17:50:04 No.102723717

>>102721996
is this supposed to sound good?

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.