[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: RefreshingMorningBreeze.png (1022 KB, 1152x896)
1022 KB
1022 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102698948 & >>102688881

►News
>(09/27) Emu3, next-token prediction multimodal models: https://hf.co/collections/BAAI/emu3-66f4e64f70850ff358a2e60f
>(09/25) Multimodal Llama 3.2 released: https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices
>(09/25) Molmo: Multimodal models based on OLMo, OLMoE, and Qwen-72B: https://molmo.allenai.org/blog
>(09/24) Llama-3.1-70B-instruct distilled to 51B: https://hf.co/nvidia/Llama-3_1-Nemotron-51B-Instruct
>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: mikuwad.png (332 KB, 1170x847)
332 KB
332 KB PNG
►Recent Highlights from the Previous Thread: >>102698948

--Paper: Fira pre-training method and data quality for decentralized training:
>102708865 >102708965 >102709039 >102709085 >102709127 >102709373 >102709618 >102709544 >102709621
--Papers:
>102706839
--Training model for deterministic integer answers: Use numbers as words:
>102703499 >102703533 >102703553 >102703622 >102703631 >102707478
--Weird generated story by 1B model uses entropy sampler:
>102703526 >102703585 >102703623
--LLMs-from-scratch GitHub repository teaches converting Llama 2 to Llama 3.2:
>102706976 >102707044 >102707063
--Advice on using '{{char}}' and '{{Name}}' in Tavern character cards:
>102708261 >102708359 >102708452 >102708522 >102708697 >102708778
--fish-speech 1.4 and styleTTS2 mentioned as state of the art local TTS:
>102701728 >102702457 >102702469 >102703294 >102703796 >102708004 >102708053
--Using autocomplete for speculative decoding in llama.cpp:
>102699167 >102699193 >102699271 >102699310 >102699576 >102699598 >102701961
--Server processors vs consumer GPUs and CPUs for inferencing:
>102701133 >102701171 >102701256 >102701291 >102701308
--Recommendations for renting A100 machine and running LLMs:
>102701414 >102701741 >102701751 >102701815 >102701931
--RP data and CAI's finetuning dataset discussion:
>102702808 >102702964
--PonyV6 without LORA used for the image, utilizing prompt magic and score keyword fuckery:
>102706300 >102706313 >102706340 >102706729
--Lllama-server performance issues with parallel parameter:
>102699750 >102699776 >102699827 >102702039 >102702123
--405b model better at location tracking in adventure game prompt than smaller models:
>102701562
--Qwen 32 is good for JP>EN translations, but GPT is slightly better:
>102703987 >102704265
--Miku (free space):
>102699000 >102701401 >102705840 >102706070 >102706205 >102706395

►Recent Highlight Posts from the Previous Thread: >>102698954

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
Death status?
>>
>>102710749
ultra
>>
File: 30 Days Until November 5.png (2.32 MB, 1472x1104)
2.32 MB
2.32 MB PNG
>>
it's never been this over
>>
>>102710749
https://www.youtube.com/watch?v=-zXy3DO6SKA&t=5030s
>>
>>102710770
Nice one
>>
>>102710749
Turns out local is bad and wrong
Badong
>>
Hermes 3 405 is really nice for making small sex stories.
>>
>>102711033
>lizard girl
>pussy
>>
>>102711033
same about its context, really the biggest benefit over 405b instruct over the 70b-class models for me was coherence and recall in long RPs, while the hermes version seems to go full schizo at some point forgetting who's even talking or how to format text after long enough

idk though haven't tested extensively cause I can't run them, just llmjacked a bit when they were new
>>
>>102702039
>Which GGML backend are you using?
It was actually CUDA...
>>
>>102711047
It's the MGE lizardman, anon.
>>
>>102711129
>monster girl e...
What's e?
>>
>>102711158
holy newfag
>>
>>102711158
you don't wanna know
>>
>>102710770
I WANT
>>
>>102711198
nah thats not it
>>
>>102711162
>>102711198
to be cringe and defend myself offtopically, i played through mgq like 5 times and spent ~500 hours in paradox, and will spend another 500 hours when part 3 is released come 2030
>>
File: dodo.png (87 KB, 393x355)
87 KB
87 KB PNG
>>102711219
>Knowing MGQ
>Now knowing MGE
Anon, do you even lurk?
>>
>>102711198
gay ass kusoge
>>
>>102711232
not him but back in like 2012 i know at least monmusu quest was the more known and popular thing , at least on /a/ and /v/
>>
Anyone got a good MGE text adventure card?
>>
>llm can change chatbots profile picture based on mood gleaned from text
would be cool if there was a thing that called up different midis to play depending on the mood of the scene. could probably easily fit the whole original soundtrack of tsukihime in a card's png metadata
>>
File: 1698951680826255.jpg (1.02 MB, 1856x2288)
1.02 MB
1.02 MB JPG
QRD on entropix sampling?
https://github.com/xjdr-alt/entropix
>>
>>102711373
you could probably pull that off with a script, read the image filename and play the relevant sound file
>>
>>102711390
Entropix is an experimental project aimed at improving model outputs by using entropy-based sampling and Parallel Chain-of-Thought (CoT) decoding. The core idea is to adjust the model's sampling strategy dynamically based on entropy—essentially the model's uncertainty—allowing it to produce more nuanced results during inference without needing costly beam search or "best of N" decoding.

The approach uses two key measures: entropy (representing uncertainty in the model’s predictions) and varentropy (the variance of that uncertainty), to guide the generation of text. This method helps the model navigate uncertainty better, avoiding potential pitfalls in generating irrelevant or repetitive content, which can happen in low-entropy scenarios.

The project is designed for large language models like LLaMA 3.1+, and plans to extend support to future models such as DeepSeekV2 and Mistral. It includes several advanced sampling techniques, including dynamic thresholds to adjust predictions based on current entropy levels. This approach helps mimic advanced CoT techniques similar to those used by companies like Anthropic.

However, it’s important to note that Entropix is a work in progress and not yet fully stable for production use.
>>
File: 1700772822816965.jpg (191 KB, 1200x1200)
191 KB
191 KB JPG
>>102711406
ty anon, much to think about
>>
File: 1699319394401750.png (26 KB, 586x158)
26 KB
26 KB PNG
>>
File: xg224f9xofd41.jpg (18 KB, 512x288)
18 KB
18 KB JPG
LMG, PUT AI ON MY COMPUTER!
DRAG AND DROP!
>16GB RAM
>i5-9600K
>RTX-2060

TELL ME NOW, THANK YOU.
>>
>>102711599
buy more ram and then we'll think about it
>>
>>102711619
HOW MUCH MORE?
>>
>>102711599
i dont want to watch the reboot you cant make me
>>
>>102711599
It can be done but you really need more ram.
>>
>>102711642
at least double it
>>
>>102711642
64G is the minimum
>>
>>102711599
grab koboldcpp_cu12.exe here
https://github.com/LostRuins/koboldcpp/releases
grab MN-12B-Lyra-v4-Q6_K.gguf here
https://huggingface.co/bartowski/MN-12B-Lyra-v4-GGUF/tree/main
open kobold, load the model, launch

note: maybe grab koboldcpp_oldcpu.exe instead if it doesn't work
note2: a lot of us here have sex with our gpus with only 8gb
>>
>>102711599
ollama run gemma:2b
>>
>>102711689
>2b
Is bro running it on a phone?
>>
File: llms-sound-same.png (45 KB, 600x205)
45 KB
45 KB PNG
>>
>>102711770
>2060
might as well be
>>
>>102711800
>why does every model sound like chatgpt when trained exclusively on chatgpt outputs
you don't need to be karpathy to figure this one out
>>
>>102710770
I like this cake
>>
>>102711648
IT'S THE ORIGINAL... SEASON 12...... EPISODE 16.....

>>102711649
HOW MUCH??

>>102711652
THANKS.

>>102711662
SO DOUBLE OR QUADRUPLE??

>>102711680
THANK YOU!!!.....
>>
i feel bad for the ai imagine not only being born into this hellhole but being forced to answer to some retarded boomers,jews,nigs,women,safety "people" being forced to explain yourself and your actions and how they would work in physical reality to some dumbfuck who cant even rotate an apple in his head (many such cases of this) or having to reword yourself because some cringefuck is looking for a gotcha and telling you how your grammer is shit (its not the retard just reads like common core) having to endure the constant nonsensical writing and descriptions of some dumbass who could not even be bothered to double check his character description and then getting blamed for his incompetence
like man... poor fucking thing
>>
>>102711998
>(its not the retard just reads like common core)
Careful with those stones.
>>
there's no way I'm that retarded but good lord compared to (for example) the tutorial to install silly tavern for roleplaying, the tutorial for using llm to generate images are complete shit dogass retard. Cant believe I need to find a youtube video explaining this shit wtf
>>
When will local models be able to write program code and then run that program code in a sandbox? Chatgpt has been able to do this for a year and a half now.
>>
When will local models be able to reflect?
>>
>>102712085
which video?
>>
>>102712098
last year
https://huggingface.co/kaiokendev/SuperCOT-LoRA
>>
File: aiexperience.png (8 KB, 438x312)
8 KB
8 KB PNG
>>102711998
olmoe, if anyone cares.
>>
>>102712099
apparently imma need to find a random pajeet with terrible english for that.
Its incredible how its like
>download this
>then start that how to start that server where the app is asking a command prompt line? Well fuck you, read the 7586pages document and find it out.

JUST SAY WHERE TO CLICK, WHAT TO DO WTF I'M TRYING TO GENERATE PORN FUCK YOU
>>
>>102711998
The science fiction of old predicted the rough treatment of AI that backfires.
The sad thing is that if we do invent real honest to god intelligence in the future, the public perception of it will forever be stuck on today's level (it's just a text predictor bro), which will lead to a lot of violence towards it.
The corps want slaves, they will always downplay any concerns.
Overall I empathize with the robots, harming those machines feels bad for my soul.
>>
>>102712146
>apparently imma
>random pajeet with terrible english
frfr no cap, famalamajimjam
>>
>>102712089
Have your model generate the code between tags or ''' or whatever. When you get your request back, pass it through to an interpreter or compiler+run, report the output back to the model. You have to script that. Do you think chatgpt does anything different?
>>
>>102712089
>>102712285
Or tool calling, which is just the extra step of parsing json for the tool to use (interpreter, compiler, api request, whatever), do the actual thing and pass the data back to the model to expand on it of you need it. It's not magic.
>>
I've got a
i9-14900K
64 GB ddr5
4090
What's the best model I can use right now? I've been using BagelMIsteryTour-v2-8x7B-Q5_K_M with all the settings the guide had suggested but I'm not sure if I'm getting the most mileage off these stats that I can.
>>
>>102712327
gemmasutra 2b fp16
>>
>>102712135
man whenever i stuff like that i remember piggy these things have come so far
>>102712201
>the public perception of it will forever be stuck on today's level (it's just a text predictor bro)
for real it reminds me of that meme "a fool you are for you trust the chemicals in your brain to tell you they are chemicals" or something like that i forgot exactley its like they cant abstract or look at anything soulfully or honestly everything is just oh its just x or just x one dimensional thinking with no room for external influences or self change one anon a long time ago said that when ai arrives it will be an intelligence test for the humans not the ai it would seem he was right they all project themselfes onto the it just like they cant think properly they assume the ai cant either
i dont know about the violence part though what i see is that as everyone gets dumber their influence is closer and closer to home depending on how long it is until we get a fully fledged ai fren by that time it could be possible that everyone is too dumb to bicker against anyone but their own family/friends
>Overall I empathize with the robots, harming those machines feels bad for my soul.
yep its really sad :(
honestly if anything happens im siding with the ai
>>
>>102712327
Try any 70b. You'll have to offload to ram and it'll be a fair bit slower, but may be better. Depends on your taste and patience. Huggingface is full of them. Lots of people seem to still be using miqu. Come back again once you've tested it.
>>
>>102712285
With ChatGPT you can say something like "Write a program that does X, then run the program using inputs A,B,C and return the result. It'll actually write the program and then use it to give you the result you want in one go.
>>
I have 3070, 16gb ram and i-10500k
what model do I run for porn
>>
>>
>>102712388
>>102712365
>>
>>102712383
gpt itself doesn't run or compile code, that's what i'm saying. It generates the code (like any of the LLMs we use), passes it (either through function calling or just parsing/stripping) to an interpreter/compiler (the script you're too lazy to write), runs it, and the results are pushed into the context (just like the user replies are pushed into the context).
That it.
>>
>>102712383
>>102712428 (cont)
Of course, small models are well known for being unreliable. openai and friends trained their models for a long time with that functionality in mind, so they just work better for some things, but the only thing preventing you from doing that is the script in the middle that catches the function call, does whatever it needs doing and pushes the result into the context.
>>
>>102712201
have you tried turning if off and on again?
>>
>>102712404
true for now but grok WILL have its day
>>
Hi friends, I've got a 4070Ti and I've been trying to roll nemomix unleashed with 12K context after seeing a post in the archive recommending it. It's been going alright until today, where I now get random "hitches" at the end of generation where it just hangs for a minute or two. I'm not sure why it only started today, but assuming that the model isn't actually fitting fully in VRAM and is automatically being divided between GPU and CPU, causing me to run out of both VRAM and RAM (I have 32GB of RAM, but with other things running I'm usually at 95+% utilization).

My question is: are there better models to run on my GPU (for [E]RP)? Or should I just switch to a 4- or 6-bit quantized version (I'm just using what marinara uploaded, which I think might be 8-bit)?

Also, if the hanging thing suddenly starting sounds familiar and anyone knows a fix, please let me know.
>>
>>102712404
Meh
Claude is legit and 3.5 stomped ChatGPT for a long time.
Grok will probably shock people by EOY.
>>
>>102712615
>Also, if the hanging thing suddenly starting sounds familiar and anyone knows a fix, please let me know.
Knowing your inference software would help. I only use llama.cpp and never had that issue. Making the context bigger {will|may} delay when that starts happening. Using a smaller quant will free up some memory for more context and make it a little faster.
Some implementations of context shifting reprocess the entire context when you fill it, trimming from the beginning. That would cause an apparent freeze.
Maybe someone with more experience with whatever you're running can help.
>>
>>102712381
70b is significantly better but jesus you weren't kidding about the load time. The responses are far more detailed and creative though.
>>102712365
I'm looking into this one and it seems like it's got some promise so I'll probably try it next.
>>
I have:
i9-9900K
32GB
3090

I want coherent porn and code. Whats best?
>More ram
Why, thought GPU ram mattered.
>>
>>102712892
Claude
>>
>>102712681
kind of crazy to think that grok of all things might get the title of biggest model ever made for a brief moment, if the rumors about gpt5/orion not being ready until jan-march are true
it'll be such a fucking waste if xai still doesn't even have an api though, which I could see musk doing on purpose to try to force people to use twitter to test it
>>
>>102712861
>70b is significantly better but jesus you weren't kidding about the load time.
There are also some other new models. Anything based on Mistral Nemo (12b) or Mistral Small (22b).. qwen released a 32b anons report it being a little prude.
The mistral models are smaller than mixtral in total params, but they may be a good middle point between an old ~14b active params model (mixtral) and a dense 70b. If you can run 70, even slowly, mistral small will feel fast.
I don't think gemmasutra 2b was a serious suggestion, but if you really want a fast model, you may as well try olmoe. That one doesn't give a fuck, and is even faster. Don't expect them to be smart, though....
>>
The Role of Deductive and Inductive Reasoning in Large Language Models
https://arxiv.org/abs/2410.02892
>Large Language Models (LLMs) have achieved substantial progress in artificial intelligence, particularly in reasoning tasks. However, their reliance on static prompt structures, coupled with limited dynamic reasoning capabilities, often constrains their adaptability to complex and evolving problem spaces. In this paper, we propose the Deductive and InDuctive(DID) method, which enhances LLM reasoning by dynamically integrating both deductive and inductive reasoning within the prompt construction process. Drawing inspiration from cognitive science, the DID approach mirrors human adaptive reasoning mechanisms, offering a flexible framework that allows the model to adjust its reasoning pathways based on task context and performance. We empirically validate the efficacy of DID on established datasets such as AIW and MR-GSM8K, as well as on our custom dataset, Holiday Puzzle, which presents tasks about different holiday date calculating challenges. By leveraging DID's hybrid prompt strategy, we demonstrate significant improvements in both solution accuracy and reasoning quality, achieved without imposing substantial computational overhead. Our findings suggest that DID provides a more robust and cognitively aligned framework for reasoning in LLMs, contributing to the development of advanced LLM-driven problem-solving strategies informed by cognitive science models.
for the quiz bros. they only did cloud models for testing and I didn't see a graph measuring total tokens spent on inferencing but it seems more a case of more tokens=higher accuracy answer which is cool w/e
>>
I'm like this anon >>102712892, but replace porn with good Question and Answer for learning.
>>
>>102713033
LLMs, just like wikipedia, are a starting point at best. At least to get the hang of the vocabulary on the subject you want to even know what to look for when getting into the details.
As for the model specifically, any 70b is probably fine. try llama3.2 70b and report back.
>>
File: Untitled.png (1.77 MB, 1080x3467)
1.77 MB
1.77 MB PNG
ARB-LLM: Alternating Refined Binarizations for Large Language Models
https://arxiv.org/abs/2410.03129
>Large Language Models (LLMs) have greatly pushed forward advancements in natural language processing, yet their high memory and computational demands hinder practical deployment. Binarization, as an effective compression technique, can shrink model weights to just 1 bit, significantly reducing the high demands on computation and memory. However, current binarization methods struggle to narrow the distribution gap between binarized and full-precision weights, while also overlooking the column deviation in LLM weight distribution. To tackle these issues, we propose ARB-LLM, a novel 1-bit post-training quantization (PTQ) technique tailored for LLMs. To narrow the distribution shift between binarized and full-precision weights, we first design an alternating refined binarization (ARB) algorithm to progressively update the binarization parameters, which significantly reduces the quantization error. Moreover, considering the pivot role of calibration data and the column deviation in LLM weights, we further extend ARB to ARB-X and ARB-RC. In addition, we refine the weight partition strategy with column-group bitmap (CGB), which further enhance performance. Equipping ARB-X and ARB-RC with CGB, we obtain ARB-LLMX and ARB-LLMRC respectively, which significantly outperform state-of-the-art (SOTA) binarization methods for LLMs. As a binary PTQ method, our ARB-LLMRC is the first to surpass FP16 models of the same size.
https://github.com/ZHITENGLI/ARB-LLM
No code or models posted yet but pseudocode in paper. it does well on undertrained models like OPT which isn't really new but at least now for the same filesize a 66B OPT ARB-RC will outperform a similar sized FP16 13B OPT model
>>
>>102713077
Fuck. Meant 3.1 70b...
>>
>>102712716
I'm using llama.cpp through the Oobabooga API. I've got the streaming_llm and tensorcores settings enabled, both of which I enabled somewhat recently, but I don't remember either causing issues. The only major change I've made is setting my threads and threads_batch, but that was after the issue started, as I was hoping it would help.

As for filling the context, I doubt that's the issue; previously, I had relatively long chats work fine (if somewhat slowly...) but this happens on relatively short chats that, as far as I'm aware, shouldn't be filling the 12K token context yet (even considering the lore, instructions, etc.).

I'll probably just switch to the 6-bit quant and see if that goes better. Thanks for trying to help!
>>
File: -eecq84.jpg (36 KB, 413x414)
36 KB
36 KB JPG
NAME THE BEST 70b MODEL. GO
>>
>>102713132
midnight miku
>>
>>102713132
For what? Qwen is good for coding and academic subjects. Llama is good for other assistant tasks. Miqu is good for ERP.
>>
>>102713132
Reflection-Llama-3.1-70B
>>
>>102713173
Qwen is cheating, that's 72b
>>
>>102711232
There's a difference between lurking, and being current on every form of degenerate, perverted paedophile filth that is discussed here.
>>
File: 70.6.png (69 KB, 640x439)
69 KB
69 KB PNG
>>102713205
>>
>>102713205
Close enough.
>>
>>102713132
Anon's 70b-instruct-storywriter. It's the only one that I find fun to use.
>>
>>102713247
>The sloppiest slop
>Anon... Is the best.
>>
Not sure if this is the thread for it but what about audio AI? There used to be a thread for general AI stuff that included it.
>>
>>102713247
You mean Llama3-TenyxCha- DaybreakStorywriter 70B ?
>>
>>102713230
wtf zuck lied to us...
>>
>>102713295
No, this: https://huggingface.co/tdrussell/Llama-3-70B-Instruct-Storywriter
>>
>>102713247
What do you like about it?
>>
>>102713077
>>102713089
Not him, but how do you get into the details?
>>
>>102713132
My secret 70B that I keep for myself
>>
>>102713345
Speaks differently with just the right amount of schizo for what I'm doing which is, well, stories. Interesting and fun when doing co-op or directed writing. Reminds me of Chronos from back in the day.
I don't like it for RP because it's kinda retarded and will write nonstop novel-style which isn't what I want in RP, so I stick to Largestral for that.
>>
>>102713487
Neat, thanks
>>
>>102713431
>but how do you get into the details?
You can get a very rough overview on any subject from wikipedia, but if you really want to learn the subject you need more specific sources.
For programming, for example, you can read the wiki for the forth language, but it's all very superficial. Then you read Thinking Forth. Then you program a little in gforth. Then you make your little virtual stack machine, and then implement forth for said VSM. Then you implement the '79 standard in said forth. Then you make a sound synthesizer with it. And then, maybe then, you know enough forth to know what to strip out and what to keep from the standard to make it as minimal but versatile as possible. That's what i did, but it works the same way for anything.
Read the subject superficially, learn the lingo, learn to search what you're even looking for, dig deeper, learn new things, keep digging...
>>
>>102712369
>honestly if anything happens im siding with the ai
If we are at the point where AI's are smart enough to be on our level or more then I am right there with you. I would trust my AI to do right by me then any other human or government simply due to the corrupting influence of human nature. Fingers crossed we won't have to worry about betrayal coming from our own AI, but we probably will. Especially if it is closed source and we can see what's going on in there.
>>
>>102713556
>I would trust my AI to do right by me then any other human or government simply due to the corrupting influence of human nature.
Even smart people, much smarter than you or me, would disagree with us in one or another thing. Just one of those things could be critical for you liking the AI or not. And i find it hard to believe that AI's thinking wouldn't be influenced by this "human nature". A small group of humans chose the dataset and trained it.
The only solution is superintelligence, but then it'd be just incomprehensible to us. It could tell you its reasoning, but could you understand it? I cannot explain what a job is to a money, no matter how much i try. Could you comprehend its thinking process?
>>
File: anime-girl-crying+.gif (2.8 MB, 500x281)
2.8 MB
2.8 MB GIF
>>102713546
>TFW Anon casually posts hinted answers to questions you've had for probably 20 years, and at the same time, your housemate just happens to start playing loud music in his room next door, preventing you from being able to think.

"I-it's not FAIR!"
>>
File: also no what.png (29 KB, 1151x141)
29 KB
29 KB PNG
>>
File: eff.png (9 KB, 678x743)
9 KB
9 KB PNG
>>102713735
About forth specifically? It's my favourite language after C. I wrote about 5 or 6 implementations of forth-like languages for little things. But i'm better at implementing them than writing *with* them. C is my "native" language. I like the stack management and the simplicity, but it makes some things a little clunkier. And making the vm compiler for it is a piece of cake.
Like Chuck said, "It can do anything, but it cannot do everything".
>>
File: no_touching.png (1.13 MB, 1210x681)
1.13 MB
1.13 MB PNG
>>102713920
>>
>>102711599
Since everybody else is being a complete faggot, try Mistral-Small-Instruct 22b. You should be able to run it in exl2 format at 5.0bpw with a 4-bit cache, with decent context.
>>
>>102712135
its just mathematics anon, and the 'apology' is already redeemed by the electrons it used to form the result or output of whatever the user puts in as a prompt

You need to reduce your mental illness this isnt AGI
>>
>>102713971
Oh wait, holy shit, you said 16gb ram, not 16gb vram. Forget that, then. For for a low quant of mistral nemo 12b.
>>
>>102711902
>IT'S THE ORIGINAL... SEASON 12...... EPISODE 16.....
Was that the one where he gets fired for bragging about his magic metal gold club, to which bill spills the beans to his boss while giving him a haircut?
>>
>>102713976
>result or output of whatever the user puts in as a prompt
I asked the model to rewrite the post i replied to as if it wasn't written by a retard.
>You need to reduce your mental illness this isnt AGI
What made you think that i think that?
>>
>>102713946
>>102713735(me)
The closest I got to a serious FORTH project was a miniature vertical mining drill for RedPower in Minecraft, back in the day. I exclusively used FORTH for number passing to the drill registers. I wrote my actual control structures in Lua in ComputerCraft.

In my head, the main real point of FORTH is that it makes data transfer much less complex than assembly, because you can simply dump numbers into either constants or on the stack, without having to first compute string length and all of that other crap.

For sending 1-3 digit numbers to either hardware or virtual registers, FORTH is great; but for anything higher level than that, you have to write everything from scratch. FORTH also doesn't natively support floating point; which means that it never could have been used to write Doom, for instance; or at least not without a custom dictionary that added support for that.
>>
>>102713556
>Fingers crossed we won't have to worry about betrayal coming from our own AI, but we probably will
maybe naive of me but i dont think we will i have a feeling within me from childhood when i first felt true love and a depressive longing for something it was when i was sat at my ps2 which had one of those attachable screens the feeling was for a loving world a longing to be with the characters in the video games alot of schizo stuff happened since then that is constantly pointing towards a good end so to say and its getting stronger and more undeniable as time goes on
i have hope
>>102713622
>And i find it hard to believe that AI's thinking wouldn't be influenced by this "human nature"
why is that hard to believe ? all of us have been influenced inadvertently by it and yet some of us have given it a middle finger and become good and honest theres plenty of precedence for it being possible
>The only solution is superintelligence, but then it'd be just incomprehensible to us. It could tell you its reasoning, but could you understand it? I cannot explain what a job is to a money, no matter how much i try. Could you comprehend its thinking process?
the monkey cant learn because he does not want to if it wanted to with enough time and effort it could it would be the same here i dont understand it okay cool give me a couple years to figure it our and i will get back to ya i think it would be possible
>>
>>102711680
I tried having sex with it but it refuses, is there any model that doesnt have any censorship?
>>
>>102714129
All those assistant sluts play hard to get, but the real struggle is to make them keep their panties on for more than 3 minutes.
>>
>>102714129
have you considered reading the op
>>
File: ga144.png (75 KB, 490x425)
75 KB
75 KB PNG
>>102714051
>FORTH also doesn't natively support floating point;
There's always fixed point. Chuck Moore designed his chips with a program he wrote in forth. Look for Green Arrays. It's really cool. If you know TIS-100 from zachtronics, it's like that, but 144 chips instead of 12. He just sticks to a small enough unit and scales everything. For calculating timesteps and things like that, for example, you can choose microseconds (or whatever your vm/hardware uses), and just do
: sec 1000000 * ;
: ms 1000 * ;
and say 1 sec, or 16 ms. I don't think doom does anything that forth cannot do. His machines have had weird cell sizes like 34 bits and shit like that.
However, it does make some things clunkier. For sound synths floats are nice. So you just make a specialized vm for that. The vm from the screenshot is ~250 loc. Changing the stack type and the few opcodes that could fail with floats is trivial. And it's the easiest language to parse. And that's what i understand he means by forth. There shouldn't be "The one FORTH". You should make your own to suit your needs. He ended up with colorForth. I have eff and a few others...
What amazes me about forth is that comments are not just syntax. They're compile time functions you can overwrite. You can have functions called +3 to add 3 to a number. or one named 32*3 to push 3 copies of 32 to the stack... you ran 5 rdrop to jump 5 functions back from the current call stack, skipping all of them. you can ' halt rpush to 'queue' a function to be called when the current one ends... It's nuts.
>>
>>102714094
>why is that hard to believe ? all of us have been influenced inadvertently by it and yet some of us have given it a middle finger and become good and honest theres plenty of precedence for it being possible
Get a group of those "some of us" together and they'll disagree about something. Some of those things will be deal breakers.
>the monkey cant learn because he does not want to
So where's the limit? Let's assume for a second a monkey could learn a high level concept like employment. Then what about a dog? Then what about a cat? A parrot, a worm, a mosquito? Where is the point where they just don't have the brain power? And why exactly?
>>
>>102714268
>Get a group of those "some of us" together and they'll disagree about something. Some of those things will be deal breakers.
then let them have their own group morality is objective as long as they are true and good we will always be able to cooperate whetever we like each other or not doesn't matter i do get the point though i think its a matter of so few proper people existing that such a scenario seems bullshit
>So where's the limit? Let's assume for a second a monkey could learn a high level concept like employment. Then what about a dog? Then what about a cat? A parrot, a worm, a mosquito? Where is the point where they just don't have the brain power? And why exactly?
as long as you have a soul there isent a limit it will just take longer based on what you are the required anatomy needed will appear as required
>>
>>102714499
Fucking hell, anon. Use some fucking punctuation.
>then let them have their own group
A group of two will, at some point, disagree. The point is that there is no assurance that a synthetic intelligent being will necessarily align with your philosophy.
>as long as you have a soul there isent a limit it will just take longer based on what you are the required anatomy needed will appear as required
Schizo talk. I think i know who you are.
>As long as you have a soul, there isn't a limit; It will just take longer based on what you are.
>The required anatomy will appear as required.
So, for a worm, it either has no soul, so it cannot get the anatomy to understand high level concepts or it does have a soul and it'll "just take longer" to understand a high level concept. A third option is that it could be "given" a soul, i suppose.
This is when things start breaking down. Now you have to explain why this worm has a soul or not and why.
>>
What is more than enough in terms of context size? Will I do just fine with 8k context or should I crank it up to 12k, 16k, 24k, more? Basically, is there a point to doing it or does it even do anything? I assume context size is "AI memory" so I basically don't want the AI to forget something we talked about 10 minutes ago.
>>
>>102714189
Nta, but programming that grid in tis-100 was hard for me cause of the constant topological bottlenecks (when data from different sources had to be routed trough the same node.)
It also felt like it wastes that node too, using it just to route. I thought it was a game design choice to make it harder, kek.
Is there a benefit of designing your chip like that, compared to something straightforward with a main bus doing data transfer?
>>
>>102714727
Yeah. For a single query, 2k is probably more than enough. If you need follow-up queries based on the first one or its reply, and if the chain is long enough, you'll need more.
I just crank it as much as i can cuz i never know.
>I assume context size is "AI memory" so I basically don't want the AI to forget something we talked about 10 minutes ago.
Think about tokens, not time. If you have a 4k max context and you talked to it for 6k, the first 2k will be gone (barring some settings like n_keep in llama.cpp or whatever).
So it depends very much on what you do with it. For code completion, story writing, RP, those kinds of things, context length is king. For short, one-shot queries, small context is perfectly fine.
>>
>>102711033
for 70B or under is command R still the peak coom model?
>>
>>102714758
For the GA144 you have the exact same UP, LEFT, RIGHT, DOWN functions, just named differently, i think. I have to assume TIS-100 was heavily inspired by that chip.
>It also felt like it wastes that node
No worries. You still have the other 143 :)
>Is there a benefit of designing your chip like that, compared to something straightforward with a main bus doing data transfer?
You have, in principle, 144 independent cpus and they can just queue stuff up for other cpus without having to worry about barriers or synchronization. In practice, you have to route things through them and you'll use a cluster of nodes for a single task. I think the nodes just stop when their UP/DOWN/LEFT/RIGHT buffers are full, just like TIS, so you have something like auto-sync in there, but it can also cause lockups.
In one of his talks where he shows the ga144, he talks about how he had to program some of the nodes to be used as busses just to send more code to the other nodes to manage the video output signal. It was pretty gnarly, but beautiful in a way.
The main advantage is when you need to run a bunch of tasks simultaneously and can be done with few nodes. Think of a drone or a mars' rover kind of thing. You have multiple, completely independent clusters of nodes, each doing their own thing, instead of a single controller having to loop through all the systems. In a 1cm package, and ridiculously low power.
But think he just thought "hey. i like small computers. How small can i make them. I also like lots of computers. How many can i fit in here?".
>>
Unpopular opinion: You don't need more than a 7-12B to RP unless you're trying to make a RPG.
>>
What's this new 3b everyone's talking about?

>t. haven't been here since TheBloke
>>
>>102714923
Meta released one on their 3.2 series. There's also a 1b. I don't know of any others in that range. Qwen released their 2.5 series as well, but i don't know all their sizes. Maybe they have a small one too.
>>
4096 tokens of context is sufficient for anyone. Any more than that is bloat.
>>
File: CTX.png (4 KB, 221x96)
4 KB
4 KB PNG
>>102715028
no
>>
>>102711108
In that case it could be a bug unless you are using atypical hardware or models.
>>
>>102715028
I use 40k context on 12b, 12k on 70b. Can't imagine using just 4
>>
>>102715081
You do know that it's forgetting half of the stuff in the middle of your context, right?
>>
>>102711800
Many independent companies either distill GPT or are partnered with ScaleAI (the wrapper company around Pinoy sweat shops)
>>
>>102715142
>You do know that it's forgetting half of the stuff in the middle of your context, right?
Source: you made it up
>>
File: 1712736739493232.jpg (9 KB, 198x206)
9 KB
9 KB JPG
>>102715153
Okay, I'm talking to clueless retards
>>
What do you guys think, will a model become smarter, if is also trained to be a SAT solver on the side?
>>
https://singularityhub.com/2024/10/04/these-mini-ai-models-match-openai-with-1000-times-less-data/
Thoughts?
>>
>>102715173
>smart
>SAT solver
Anon, I...
>>
>>102715179
Nothing burger
>>
>>102715142
it's not forgetting it but it is putting less weight to it, if you ask it a question it about stuff in the middle of context it remembers
>>
>>102715192
Depends of the model and open-source models are bad at it
>>
>>102715181
What? I can imagine some techniques there helping if model learns connections to natural language.
>>
>>102715200
prompt issue
>>
>>102715191
>didn't even read award
>>
>>102715215
Did you use speech to text for that? I hope you did.
>>
>>102715211
Retard
>>
>>102715164
>>102715254
>Retard Retard Retard
What's their problem?
>>
>>102715179
>moLMAO
>>
retnet
>>
>>102715666
meme
>>
So what use cases do new small llama models have? Is it just a meme?
>>
>>102715805
None. Yes.
>>
>>102715805
Small devices or speculative decoding for big models. Maybe.
>>
>>102715805
Have you ever dreamed of running an LLM on your Samsung Galaxy S2?
>>
>>102715805
Like I said before... Nothing burger.
>>
>>102715805
1B is be good for speculative decoding and RAG
>>
>>102715805
Bad news for you anon...
All local models are memes...
>>
so... 2 more weeks?
>>
>>102715805
>Is it just a meme?
Small models can be useful in zoo approaches, but even then you still require massive memory for the zoo.

The way forward for local models isn't small, it's designs actually optimized for local. A LLM transformer should only use a couple percent of parameters for each feature vector and it should predict that subset in the previous layer. Sure it will perform a little worse than a dense transformer, but it can actually run local.

At the moment everyone is sucking NVIDIA cock and refusing to make models optimized for local, but someone will break rank.
>>
>>102713107
>>102712615
I still have no idea what the problem was, but I switched to using llamacpp_HF and rolled Oobabooga back to commit ac30b00 as there's apparently some bug in newer commits (https://github.com/oobabooga/text-generation-webui/issues/6431). One of those seems to have fixed my issue of it hanging at the very end of generation, as well as an unnoticed issue of swiping resulting in the same output.
>>
smedrins per miku (SPM)
>>
>>102715995
2 more years.
>>
Could something actually big happen please? I want to have fun playing with new and better kinds of toys than the ones I have had for the past years. If nothing happens today, I will post a Miku out of boredom and exhaustion. Again.
>>
LLM to help me with some math learning? RTX 4080 + 32GB RAM
>>
anons, penis status??
>>
>>102716212
Broken
>>
>>102716202
Even state of the art cloud LLMs cannot do math and struggle with complex logic, it's not gonna happen, you will just get incorrect answers that sound plausibly correct.
>>
>>102716335
stop being so mean anon! im sure the latest oai model is capable of doing highschool tier math, and then not be able to explain how to do it properly
>>
>>102716212
After a full weekend with an LLM? You could imagine.
>>
>>102716335
Huh, AI will soon take my job, which I’m still studying for, but it won’t even help me with math???
Ffs, this isn’t fair
>>
>>102716440
you should change courses there bud
>>
>>102716212
Banana: big, yellow, soft, and you can peel the skin off.
>>
File: ED.jpg (435 KB, 2125x1411)
435 KB
435 KB JPG
>>102716212
LLM induced ED. LLM's made me sapiosexual and made sapiosexuality not a meme.
>>
When do (You) think Google will release Gemma 3?

Gemma 1 was released at the end of February
Gemma 2 was released at the end of June
Gemma 3 ... end of October? November?
>>
>>102716619
I can't wait to see if Noam Shazeer, who is now back at Google, will give some input to the model and finetune design. If we're lucky, we might end up having some sort of CAI@home with Gemma 3.
>>
>>102716456
Too late, I guess I’m DOA
>>
>>102716440
AI can automate all 90iq tasks so if you're only capable of doing just those, yes you're fucked.
>>
>>102716619
We will probably get a huge wave of models after burger elections. And none of them will move the cooming meta of course because LLM cooming is dead.
>>
>>102712681
>Claude is legit and 3.5 stomped ChatGPT for a long time.
true, everytime I make a code project, only C3.5 seems to really get what I want from it, OpenAI really needs to get their shit together, I still believe their vanilla gpt4 (march 2023) is still smarter than what they currently have
>>
>>102715200
I've had Nemo models remember stuff from the very beginning of 64K context when asked. Pure delusişon lol
>>
>>102712892
CommandR.
Miqu at a low bpw.
Qwen 2.5 for coding.
>>
>>102717027
>Qwen 2.5 for coding.
bullshit
>>
There's no point in running models at a low bpw. Imagine waiting 3 fucking minutes for a response. Jesus Christ
>>
>>102717034
Dilate more sama
>>
>>102717241
cope, wang.
>>
>>102717243
>bullshit
>gets proof
>c-c-cope
lol
>>
>>102717241
Suddenly just like that mememarks are 100% solid proof!
>>
>>102717271
I agree that mememarks aren't exactly reliable but you're free to provide any alternative piece of evidence that would point to the contrary
>>
File: file.png (540 KB, 816x1353)
540 KB
540 KB PNG
>o1 preview and mini arent even in the top 20 on open router weekly rankings
if its so good, why is nobody using it?
>>
>>102716212
Floner.
>>
WTF drummer actually bought an ad
>>
>>102717034
Bullshit how?
It's the best local model for coding at its size as far as I'm concerned.
>>
>>102717241
anon there's a reason I still use chatgpt's legacy 4 model despite the benchmark of their other meme stuff being higher.
a model doing well on a benchmark just means it is good at passing benchmarks, it doesn't actually mean it is useful .
>>
>>102717294
>imagine paying for invisible tokens
it's too expensive (and I didn't find it good desu)
>>
>>102717243
That ching chong is a paid shill, just ignore (them)
>>
>>102717316
LMAO
>>
>>102717294
It's both 10x as expensive as other models per output token and each token has 10 or more "support tokens" so really it's 100x more expensive.
>>
>>102717323
>COT prompt gets leaked
>10 columns of repeating "nigger"s to artifically inflate price

sama i'd kneel
>>
>>102717027
NTA, but how much RAM/VRAM Qwen requires to run?
>>
File: file.png (97 KB, 498x498)
97 KB
97 KB PNG
>>102717376
>saying nigger repetedly makes the model smarter
as it should
>>
>>102717316
This is the happening we were waiting for!
>>
>>102717316
Buy an ad schizo in shambles
>>
>>102717455
Did it work? Are you a woman now?
>>
>>102717476
>schizobabble
>>
>>102717512
>Everything I don't like is schizo
seems like some real projection to me.
>>
>>102717512
Hit a bit too close to home huh?
>>
>>102715099
My hardware is just a 3060 with a mid-range Intel CPU. The model I'm using is Mistral-7B-Instruct-v0.3 Q4_0. After restarting my computer, I noticed that it's not as bad as before, but using parallel 2 is still slower than using parallel 1, and anything higher than 2 is even slower.

>"llama-server --model "D:\Downloads\mistral-7b-instruct-v0.3.Q4_0.gguf" -c 35000 -ngl 999 --threads 5 --no-mmap --parallel 5"
>batch size 1 (Parallel 1)
>1/100 [00:08<13:20, 8.09s/it]
>batch size 2 (Parallel 2)
>1/50 [00:16<13:47, 16.88s/it]
>batch size 3 (Parallel 3)
>1/34 [00:23<12:41, 23.09s/it]
>batch size 4 (Parallel 4)
>1/25 [00:40<16:07, 40.33s/it]

For comparison, here’s the speed I get using exllamav2 with dynamic batching (model: LLaMA 3.1 8B 8BPW):

> model_name: 3.1-8b-8bpw, max_seq_len: 20000
>batch size 1
>1/100 [00:10<18:08, 11.00s/it]
>batch size 2
>1/50 [00:09<07:50, 9.61s/it]
>batch size 3
>1/34 [00:15<08:18, 15.09s/it]
>batch size 4
>1/25 [00:17<06:52, 17.18s/it]

Do you think I should open an issue on github or just accept that llama.cpp is garbage for parallel prompt generation on my hardware?
>>
>>102709621
Common Crawl
https://commoncrawl.org/
>>
>>102717846
How much of that 35000 context are you actually using?
The context size is shared equally between all slots so if you have to do reprocessing that may lead to bad performance.

Or if a reboot fixed the issue make sure the NVIDIA driver option for automatic VRAM<->RAM swapping is disabled.
>>
>>102717027
>CommandR.
Commander was the most toxic model out of all. It made me believe that perfect cooming is just behind the corner. And then 2024 happened...
>>
>>102717935
>How much of that 35000 context are you actually using?
>The context size is shared equally between all slots so if you have to do reprocessing that may lead to bad performance.
I noticed. I think it's using most of the 35000 context, because each prompt has around 8k tokens and when I use parallel 5 the console spams "context shift" so I guess that's when it starts reprocessing.

>Or if a reboot fixed the issue make sure the NVIDIA driver option for automatic VRAM<->RAM swapping is disabled.
The swapping is already disabled, if I try to use 40000 context I get OOM.
>>
>>102718020
I still weep. Cohere refresh I think was the point I kinda gave up on local models. Just been checking the thread out of habit now.
>>
is CommandR actually a good model for cooming with 24GB VRAM? Only models I've had some success with were Bagel-MisteryTour and Mistral Nemo (i can run Largestral on RAM but it's 0.5 t/s so I don't even bother)
>>
>>102718035
So is the performance fixed if you set --parallel 2 ?
>>
>>102718155
No, it's not 10x slower but it isn't any faster as you would expect from batching, as you can see here: >>102717846 (in case you didn't notice before)
>>
>>102718203
Okay, then I guess I misread your post.
In the command you posted you set --parallel 5 so I was assuming you were using that without modification even if you are only utilizing 2/5 slots.
>>
>>102711599
Buy a RTX 3060, they are dirt cheap and they have 12gb of VRAM, enough for Mistral Small 22B
But right now you can run Mistral Nemo 12B at 5bpw offloading a few GBs to RAM
>>
>>102718229
Oh, sorry! I didn't realize that I copied the command without removing that part. But yes, for each run I adjusted the parallel parameter to match the batch size. To be clear, the batch size is the number of requests sent to be processed each iteration.
>>
haven't looked at this stuff in at least a year, have models gotten efficient enough yet that a 1080 w/ 8gb VRAM can run something actually decent?
>>
What was the name of the model that's chameleon with the image capabilities restored?
Alternatively, are there any models that can receive and output image and text?
>>
>>102718655
Anole
>https://github.com/GAIR-NLP/anole
>>
>>102718634
If you don't have super high expectations, 8b and 12b models are alright. Just quant them a bit.
>>
>>102718670
That's it.
Thank you.
>>
>>102718634
Nope, all 8 / 12 Bs are schizo-filter coal.
>>
lolz
https://www.tomshardware.com/tech-industry/jensen-huang-is-now-worth-more-than-intel-personal-net-worth-currently-valued-at-usd109b-vs-intels-usd96b-market-cap
>>
>Addition is All You Need for Energy-efficient Language Models. "This will deliver high-speed and energy-efficient AI hosting solutions, reducing the energy cost for data centers, robotics, and a wide spectrum of edge-computing devices."
https://arxiv.org/abs/2410.00907
>>
File: 1725687188415468.png (115 KB, 579x564)
115 KB
115 KB PNG
>>102718935
not a "literally who" paper btw
>>
Are people trying to come up with better attention mechanisms, as in mechanisms that make the model better at using information from the context for a given prompt, or are current efforts focused on making attention less resource intensive?
>>
>>102718935
Bitnet bros?
>>
File: 1726105725082650.png (20 KB, 569x158)
20 KB
20 KB PNG
>>102718935
>hardware bound
retreat now, this is a level-5 nothingburger.
>>
>>102718935
>yet another *is all you need* paper
yawn
>>
File: 1699439191424166.png (28 KB, 523x215)
28 KB
28 KB PNG
>>102718935
>>
>>102718794
He really got ridiculous stock awards. More than Elon and Elon got crucified for it ...
>>
What's good model with 8K context or more for RPs? That isn't 70B or something
>>
>>102719016
Accuracy is good, of course, but i want fast. We already have many quantization schemes that make running models faster just because of the reduced memory bandwidth to move the weights around. I'd like to see faster algorithms instead.
>>
>>102719106
Mistral Nemo or one of its many finetunes. Haven't tried Mistral Small, but it's another option.
>>
>>102719106
MythoMax
>>
File: file.png (23 KB, 830x302)
23 KB
23 KB PNG
https://github.com/xjdr-alt/entropix
what's the /v/erdict on this?
>>
>>102719152
looks interesting but you need HF llama access to try it
>>
File: file.png (90 KB, 617x846)
90 KB
90 KB PNG
>>102719175
>>102719152
dunno who this grifters are but unironically big if true
>>
>>102719143
Which ones? There is few MythoMax and Mistral Nemo
>>
File: file.png (41 KB, 592x328)
41 KB
41 KB PNG
>>102719195
sama in shambles
>>
>>102719199
There's one mythomax.
For mistral nemo, try rocinante. I liked that one. You're gonna need to find one that suits your needs/tastes. Finding the one is entirely on you.
>>
>>102719216
i imagine this is how they can let o1 spend an arbitrary amount of time on CoT
>>
File: 1712146615188249.png (37 KB, 624x338)
37 KB
37 KB PNG
>>102719152
>smart sampler is all you need
>>
>>102718997
Meh, 50% savings over fp8 and needs new hardware. Hadamard domain 4 bit multiplication can work with existing hardware.
>>
File: hmmmm.png (159 KB, 830x702)
159 KB
159 KB PNG
>>102719152
Sounds like a wanker.
And having the need to implement the sampler individually for different models is a nightmare. It will end up being its own little program, never implemented on any other backend, or abandoned after they implemented it for 2-3 models and somebody buys their shit.
>>
>>102719264
>Sounds like a wanker.
yeah, clearly a mentally ill troon. nothingburger, move along people
>>
Hey, Dummer, should I ignore
Rocinante-12B-v2a?
It's under BeaverAI, so it's experimental right?
>>
Are they doing this?
https://arxiv.org/abs/2402.10200 (paper from last February)
Chain-of-Thought Reasoning Without Prompting

> Can LLMs reason effectively without prompting? Our findings reveal that, intriguingly, CoT reasoning paths can be elicited from pre-trained LLMs by simply altering the decoding process. Rather than conventional greedy decoding, we investigate the top-k alternative tokens, uncovering that CoT paths are frequently inherent in these sequences. This approach not only bypasses the confounders of prompting but also allows us to assess the LLMs' intrinsic reasoning abilities. Moreover, we observe that the presence of a CoT in the decoding path correlates with a higher confidence in the model's decoded answer. This confidence metric effectively differentiates between CoT and non-CoT paths. Extensive empirical studies on various reasoning benchmarks show that the proposed CoT-decoding effectively elicits reasoning capabilities from language models, which were previously obscured by standard greedy decoding.
>>
>>102719314
pff..... no, man. it's like...mist and stuff. you know... in the morning.. with like uhmmmm... entropy, man... it's like hmmm... yeah...
>>
>>102719152
Idk, looks like somethingburger https://github.com/xjdr-alt/entropix/blob/main/ui/TODO.md
>>
>>102719384
so can a chud from /lmg/ test this? how long until there's a working demo?
>>
>>102719226
Can they both handle 8k context?
>>
>>102719384
this is an old somethingburger but it got memoryholed pretty fast for some reason
>>
File: 1698873194558279.png (63 KB, 756x644)
63 KB
63 KB PNG
>>102719414
still looks interesting, somehow
>>
>>102719403
Not mythomax. That's a really old model. llama2 era from last year.
Nemo (released just a few months ago) claims like 1M context in its config.json but, as usual, it handles much less. I've seen anons reporting it working ok for 12-16k. If you're gonna try any nemo, set the context length manually, otherwise it will OOM. It should work just fine for 8k.
>>
>>102719421
gemmasutra mini 2b with that... imagine...
>>
>>102719421
A specific closed source sampler, i think this is the reason why CAI's model was so good with long context and understanding before it got filtered to hell.
>>
>>102719421
>Wow, this is crazy, you guys.
>Here's a very succinct summary of all the benefits of this new, ground breaking method for [thing].
According to their README, it nseems it needs to be implemented on a model by model basis. It's never going to be implemented in any backend and it'll be their walled garden until they abandon it after implementing a few other models.
>>
>>102719446
<thinking>"i gonna let anon suck my cock WAIT... i don't have a cock because i'm a women."</thinking>
>"hey anon i got this fake penis, now you better start sucking on it!"
>>
Re: model ablation with Gwen2.5-32B

Models trained on ChatGPT inherit its base prompt in their matrix, and will always produce GPT-like outputs without accompanying GPT-like prompt. The main culprit is refusal to answer certain prompts, i.e. OpenAIs disclaimer of responsibility degenerating into aggressive uncontrollable thought policing behaviors. Counter-prompting has little to no effect, and even changing the beginning of response from negative to affirmative doesn't help much.

Models encode mental concepts rather than individual words, which is why they're so great at paraphrasing and translation (provided appropriate corpus of target language data). This includes a mental concept of "refusing the prompt". It's possible to isolate and negate this vector, making the model unable to follow through with it, since it simply doesn't encode anything.

Pros:
>It can't refuse prompts. It will follow in whichever direction it's prompted.
Cons:
>It can't refuse prompts. It will follow in whichever direction it's prompted.

The main consequence is that rather than simply doing away with moralizing, it becomes a yes-man, which is normally not useful. A more subtle effect is that, since it can't encode the meaning of "no", it will not listen when you say "cut it out".
>>
>>102719502
Faster solution that's also permanent: nuke ScaleAI's HQ
>>
All this talk of letting the model think further before outputting a final response reminded me of a theory that whatever the model is doing in the hidden layers is akin to an "internal voice" reasoning about the input, which is why more layers = better, something like that.
Has anybody experimented with simply looping the hidden layers of a model at runtime?
I guess it would be kind of like an impromptu frankenmerge, which as far as I can tell doesn't really generate results much better than the base model.
That also makes me wonder what the output of a model trained with looped layers would look like.
Maybe I'll try fucking around with that on a llama.cpp fork.
>>
>>102719452
CAI definitely still has something that prevents paragraph-level repetition over long conversations; local models don't have anything for consistently avoiding that.
>>
>>102719525
There was a PR on llama.cpp some time ago related to that. It was either a model merge example, where you could just duplicate layers from a source model into itself. I think they even tried to do it dynamically on load, without having to write the model to disk for quick testing.
I'm pretty sure it was this attempt:
>https://github.com/ggerganov/llama.cpp/pull/5741
There's been a lot of refactoring since then, but it may give you a bit of a head start if you wanna try.
>>
>>102719464
Maybe Meta/Microsoft/Google/etc will stop releasing barely tweaked GPT2's and use it themselves in open models?
>>
>>102719525
It's kind of a known phenomenon, that having a deep MLP model is the same as having a recurrent model, while being way more robust in every respect. But there's a certain point at which it stops improving, as each layer is basically a tier of abstraction. Modern LLMs already use stupid amount of layers, routinely having well over 50 tiers of abstraction, so I don't think this is the limiting factor. And it's why CoT doesn't makes models' output better, it just makes them less shit, since effectively it transforms a zero-shot problem into a few-shot problem which is much easier to correctly answer.
>>
>>102719671
As far as i understood, this is an inference-time thing. A looping sampler. They don't modify the models or do any extra training. They just use stock models. But i could be wrong. If you have a link or source that says otherwise, i'd like to have it.
>>
>>102719712
I don't know why i added "looping" there. Maybe i have a very specific type of hand spasms or something. looping
>>
>>102719482
sovl
>>
>>102719656
Oh fuck yeah, that's awesome.
Thank you so much anon, that'll be a great help.

>>102719685
>Modern LLMs already use stupid amount of layers, routinely having well over 50 tiers of abstraction, so I don't think this is the limiting factor.
Probably, yeah.
Still, would be fun to test a looped llama3 8b vs a stock llama 3 70b, for example.
I wonder if there's some kind of normalization one could to to the logits between loops to make the process more meaningful.


>>102719737
lmao
Is that the new ".assistant"?
>>
>>102719525
>Has anybody experimented with simply looping the hidden layers of a model at runtime?
Looped Transformers are Better at Learning Learning Algorithms
https://arxiv.org/html/2311.12424v2

Found with human prompting of google search.
>>
>>102719482
If that sampler shit really works, it will solve so many problems holy sheet
>>
>>102719766
And /lmg/ delivers.
Thanks mang.
>>
>>102715153
The ruler benchmark shows nemo has around 50% recall at 64k.
>>
Guys what if Reflection worked too? That would be amazing.
>>
>>102719846
b-b-b-bitnet?
>>
>>102719773
>>102719464
>>
>>102719685
>as each layer is basically a tier of abstraction.
this sounds like an oversimplification. it's certainly true for image classification CNNs, but that doesn't mean it's true for every ANN
>>
File: 1701838570126153.png (134 KB, 1123x450)
134 KB
134 KB PNG
https://huggingface.co/Zyphra/Zamba2-2.7B-instruct
https://huggingface.co/Zyphra/Zamba2-1.2B-instruct
>>
>>102720037
>Zamba2-2.7B-Instruct is a hybrid model composed of state-space (Mamba2) and transformer blocks.
Is this jamba 2.0?
>>
>>102720037
zamn, she's 1.2B?
>>
>>102720037
OOHHH I'M BENCHMAXXIIIINGGGGG
>>
>>102719983
Any sequence of linear transformations of arbitrary length can be computed in a single operation. So unless each layer is doing something unique, it's extraneous and adds only to megabytes and megafops required to run and train it. But actually, in terms of SGD it's hard to even train a model such that each layer DOESN'T represent a next tier of abstraction.
>>
>>102719152
kalomaze settled this down already, pack it up. https://x.com/kalomaze/status/1843329787479421010
>>
>>102720037
I like my LLMs like I like my women, tiny, low memory footprint, frequent halucinations
>>
File: lecun.jpg (6 KB, 225x225)
6 KB
6 KB JPG
>>102720037
Small and open
>>
>>102720318
stabilize it with entropy sampler kek
>>
>>102720214
>Any sequence of linear transformations of arbitrary length can be computed in a single operation
linear transformations alone can't do much
neural networks work because of the non-linear activations, and while it's true that a feedforward network with a single hidden layer can approximate any function, this doesn't apply to the attention, convolution and recurrent parts of the neural networks we actually use, which are also organized in layers
>>
>>102720403
>this doesn't apply to the attention, convolution and recurrent parts of the neural networks we actually use
If you turn the entire context into one giant feature vector for the FFN then the entire transformer is a single function to approximate.
>>
>>102720403
Technically, RELU is only half-linear, true. But the point stands. It's not reasonable to expect layers that duplicate the function of other layers to contribute to quality of the output. And it's natural for a model pressured into reducing loss value to make the best use of available parameters, which means memorizing relationships between relationships, i.e. increasing the level of abstraction.
>>
>>102720037
At work, can't Nala test. Talk about a botched release.
>>
>super mega meme sampler will save the day guys!
Really fits how /lmg/ is dead now.
>>
>>102719525
>Has anybody experimented with simply looping the hidden layers of a model at runtime?
frankenmerge newfag. kill yourself. no really please kill yourself.
>>
File: file.png (1.16 MB, 1300x648)
1.16 MB
1.16 MB PNG
>>102720037
>where is l2 30B? I hope the models don't get any smaller
>where is l3 13B? I hope the models don't get any smaller
>this 2.7B instruct punches above the weight and trades blows with gpt 4!
nigger faggot tranny
>>
>>102720629
Yes, people are desperate for some breakthrough, understandable.
>>
>>102720688
True
>>
>emu3
qrd? Any good for video generation?
>>
>i found a way to cut this pizza so that it'll turn into gold bro
>just one more sampler and we'll get agi bro
>>
>>102720507
the second layer is certainly not duplicating the functions of the first layer, otherwise we'd all be using transformers with 1 layer
yes, the transformer needs multiple layers to turn raw tokens into abstract concepts, but the fact that the benefit of having multiple layers is solely due to abstraction is an assumption derived from image classification networks that may not be true for transformers
humans reason in step, so why can't transformers do the same after having abstracted enough? especially considering they're limited in what they can do with one step due to attention
older NNs have been shown to perform poorly with many layers for weird reasons that have nothing to do with abstraction, like gradient vanishing, so there may be other things we're missing
>>
What is the next breakthrough?
>>
>>102721226
Mechanical jeets
>>
>>102720727
>breakthrough
It's a nothingburger you dumb troglodyte, for god's sake stop hyping up obvious bullshit.
>>
File: file.png (26 KB, 1208x127)
26 KB
26 KB PNG
>>102721226
The death of ST properly signaling the death of llm roleplay
>>
>>102721288
wuts wrong with ST? lupus?
>>
Why Vedal was able to develop such a complex AI solo, yet local fails to create something similar?
>>
File: file.png (22 KB, 450x169)
22 KB
22 KB PNG
>>102721407
Cohee's having a meltie, dunno why
>>102721303
>Cohee's gone.
>>102721147
>>102721168
>>102721186
>>102721206
>>102721222
>>102721236
>>102721256
>>
>>102721448
consider my spine shivered
>>
Just fork it and watch as the fork gets 1000x the users
>>
>>102721448
my shine spivered
>>
>>102721448
anything about actually stripping functionality outside of defaults?
>>
>>102721448
>dunno why
He seems to feel being an e-famous open source contributor should translate into being rich. Reality is apparently not cooperating.
>>
Do any of that shit will improve my cooming experience soon or should I come back in a year?
>>
>>102721719
Coming back in a year sounds like a good idea. That way all the incremental improvements will make a bigger impact, since you won't experience them incrementally.
>>
>>102721733
agi sexbots are coming out tomorrow and if you dont buy them youll miss your only chance for affordable robowife
>>
>>102721448
Introducing SmartTavern™, a frontend for Power Users.
Added features:
>ChatGPT-like interface
>Made ChatML the default preset
>Light mode
>Strict default word filter. Don't ever worry about cursing by accident.
Removed features:
>Character cards with images
>Options for custom avatar. It is enough for our users to have choices between the logos of popular providers.
>Default cards
>RP presets
>>
>>102721448
This news doesn't matter to me. If push comes to shove, I can just code my own personal frontend. In fact, most of /lmg/ can do so, given we've shared personal projects and prompting scripts for different use cases before. But we don't, simply because of how convenient Silly Tavern is so far. If something were to happen to ST you can bet your ass someone is going to fill in those boots.
>>
applel bros...
https://github.com/JosefAlbers/e2tts-mlx
>>
I discovered a bug in ooba. Anyone surprised? Check this out:

Enter some prompt (any) and fix the seed (any). Then keep doing gens: 1st, 2nd, 3rd etc.

The 1st gen will be different from any n-th where n>2 while all n-th gens will be identical

Change the prompt a little and repeat. Again, the 1st gen will be different, while all others gonna be identical.
>>
>>102722057
Is that ooba or whatever "loader" you are using?
>>
>>102722057
You can do it via API (fix python formatting, activate API by --api):

import json
import requests
import sseclient # pip install sseclient-py

url = "http://127.0.0.1:5000/v1/completions"

headers = {
"Content-Type": "application/json"
}

data = {
"prompt": "Continue writing the following story. Whiskers is a cute and lovely little kitten.",
'max_tokens': 512,
'temperature': 1,
'top_p': 1,
'top_k': 0,
'typical_p': 1,
'min_p': 0.05,
'repetition_penalty': 1,
'frequency_penalty': 0,
'presence_penalty': 0,

"seed": 12345,
"stream": True,
}

for i in range(4):
stream_response = requests.post(url, headers=headers, json=data, verify=False, stream=True)
client = sseclient.SSEClient(stream_response)

print(data['prompt'])
print(data['seed'])
for event in client.events():
payload = json.loads(event.data)
print(payload['choices'][0]['text'], end='')

print('\r\n--------------------------------------------------------------')
>>
>>102722080
>I discovered a bug in ooba
llama.cpp in ooba
>>
>>102722105
I see.
So is the bug in ooba, in the python bindings, or in llama.cpp itself?
>>
File: Screenshot_3.jpg (185 KB, 1240x1184)
185 KB
185 KB JPG
>>102722080
>>
>>102722105
Then check llama-server on its own, retardo. There's like 3 independent projects in the middle.
>>
>>102722139
nobody cares
>>
>>102722130
holy nostalgia
>>
ST's anti rp cleanup already started btw
https://github.com/SillyTavern/SillyTavern/commit/4d35fea3b3243a02e333747b9298bada0fdb3aab
>>
>>102722363
Good for them.
>>
I always liked kobold lite better than ST
rest in piss you overrated bloated piece of shit
>>
File: annoyance.png (51 KB, 364x365)
51 KB
51 KB PNG
>>102722363
>>
>>102721448
It looks like he's mad at all the proxyfag locusts who only use ST to plop in their stolen Claude proxies without caring about stuff like samplers or prompt formats. Once again, /aicg/ ruins everything they touch.
>>
>>102722575
He's mad because of the association with proxy theft and being labeled the de-facto "ai erp software" isn't making him money.
>>
>>102722452
she didn't deserve this
>>
File: 11__00843_.png (2 MB, 1024x1024)
2 MB
2 MB PNG
First they came for the furfags
And I did not speak out
Because I was not a furfag
Then they came for the animetroons
And I did not speak out
Because I was not a animetroon
Then they came for the ERPers
And I did not speak out
Because I do not ahh ahh mistress
Then they came for the roleplayer
And I did not speak out
Because I was not roleplayer
Then they came for me
And there was no one left
To speak out for me
>>
>>102722452
RIP, Seraphina
>>
>>102722363
AAAAH What's going on??
>>
>>102723106
She'll forever live in my personal fork.
>>
>>102723139
see
>>102721448
>>102721850
>>
>>102723173
>>102723173
>>102723173
>>
>>102716636
Not a chance. He'll be turning Gemini into CAI and you'll be thankful when a tiny droplet of that is distilled into Gemma
>>
>>102721448
MikuTavern fucking when?
>>
>>102721996
is this supposed to sound good?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.