[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1720250206805784.jpg (174 KB, 928x1232)
174 KB
174 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102947669 & >>102937407

►News
>(10/24) Aya Expanse released with 23 supported languages: https://hf.co/CohereForAI/aya-expanse-32b
>(10/22) genmoai-smol allows video inference on 24 GB RAM: https://github.com/victorchall/genmoai-smol
>(10/22) Mochi-1: 10B Asymmetric Diffusion Transformer text-to-video model: https://hf.co/genmo/mochi-1-preview
>(10/22) Pangea: Open-source multilingual multimodal LLM supporting 39 languages: https://neulab.github.io/Pangea
>(10/21) IBM releases Granite 3.0: https://hf.co/collections/ibm-granite/granite-30-models-66fdb59bbb54785c3512114f

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: 1715129444627.jpg (187 KB, 1024x1024)
187 KB
187 KB JPG
►Recent Highlights from the Previous Thread: >>102947669

--Paper: Stick-breaking Attention:
>102952719 >102952802 >102952871 >102952956 >102953027 >102952833 >102952907
--Papers:
>102951109 >102952611 >102954717
--Optimizing NVIDIA GPU power consumption and performance for llama.cpp:
>102948247 >102948378 >102950135 >102950572 >102951311 >102948311
--Suggestions for preventing bot personality evaporation and repetition:
>102947889 >102947907 >102947914 >102947942 >102947989 >102948028 >102947977
--Study suggests personas in LLM prompts may not be helpful:
>102951406 >102952205 >102952589
--Nala test discussion and analysis:
>102955217 >102955309 >102955503 >102956298 >102957333
--Mixed opinions on creative writing strategy for LLMs:
>102950234 >102950505 >102950671 >102950702 >102950837 >102950961 >102951040 >102951108 >102951105 >102951654
--Meta introduces quantized Llama models:
>102959057
--Experimenting with small model to rewrite sentences from main model:
>102954203 >102954256
--Aya Expanse safety concerns and unhinged models discussion:
>102954688 >102954728 >102955637 >102955663 >102955765 >102955826 >102955910 >102956238 >102955823 >102955936
--70B model behaves unusually on UGI leaderboard, 8B Hermes fine-tune boosts uncensored intelligence:
>102959022 >102959162
--User seeks help configuring tabby API for multi-GPU support:
>102952918 >102952947 >102953010 >102953067 >102953382 >102953578
--OmniGen release and potential scam concerns:
>102948794 >102949351 >102949433
--Modified SIFT to work with LoRAs, might implement sparse PEFT:
>102960133
--INTELLECT-1 decentralized model training is at 20.95% completion:
>102947727
--Miku (free space):
>102948114 >102948965 >102948991 >102949018 >102949052 >102951296 >102953597 >102954067 >102956157 >102956784 >102956921 >102957174 >102957398 >102957952 >102958005 >102960102

►Recent Highlight Posts from the Previous Thread: >>102947676

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
File: chrome_oiWwQ0eP6s.png (503 KB, 602x753)
503 KB
503 KB PNG
Second for total AI safety!
>>
>>102961425
apache license. We could ask comfyanon for advice.
>>
4o knows why too many specific details about the gotrek and felix books without the usual hallucinations to not have been trained on them.
>>
File: 8.png (74 KB, 916x775)
74 KB
74 KB PNG
INTELLECT-1 is at 22.63% complete, up from 20.95% last thread.
>>
https://x.com/Yampeleg/status/1849488588276175333
>>
>>102961560
wouldn't distributed training be inefficient due to latencies? Anyone know the exact speed difference (assuming similar amounts of hardware)? Power costs?
>>
>>102961560
this meme needs to die
>>
File: hm.png (245 KB, 600x623)
245 KB
245 KB PNG
>>102961589
>>
Is the 70b Nemotron really the first instruct model that does not use a prompt format?
>>
>>102961733
Literal jew making jewish things.
>>
>>102961733
idiot or genius? I am actually asking.

I assume that natural language changes to an established model are going to function very poorly if the languages don't share the same sentence structure. Is Hebrew's subject-verb-object and historical verb-subject-object structure going to destroy this change? I thought this is why Japanese translations are more janky.
>>
File: prompt.png (19 KB, 1361x330)
19 KB
19 KB PNG
>>102961763
What do you mean exactly?
>>
>>102961763
No, it uses the same prompt format as llama 3.1.
>>
>>102961560
this meme needs to live
>>
File: file.png (695 KB, 768x768)
695 KB
695 KB PNG
>>
>>102961864
Unlike you mirror-posting faggot
>>
>>102961622
of course it's inefficient, but it gets around that by having a lot of compute connected, and the loss of a single machine doesn't junk the entire process
is there wasted compute? sure
does it need an entire megalithic datacentre and a city's worth of power in one place? no
>>
>>102961914
In exchange for 3 cities worth somewhere else?
>>
>>102961816
No anon, Japanese is janky because it's a language that leaves most things implicit when clear from context, and to make matters worse each kanji can have multiple meanings, and to make matters worse there are a ton of kanjis out there. I think languages like Japanese and Chinese are the worst case scenario for models that were trained primarily for English, just like Japanese is hard to learn for English speakers.
>>
>>102961816
You don't need to go that far with languages. French's weird negative verb usage "je *ne* parle *pas* français" (which is the only thing i can honestly say) and counting in 20s, and spanish gendered nouns and not using, at least commonly, adjective-noun structures like English. Italian just sounding old for other romance language speakers. Brazilian Portuguese somehow being louder than Portugal's... wait.. i'm rambling now...
>>
>>102961926
everywhere else
>>
File: file.png (84 KB, 1910x263)
84 KB
84 KB PNG
left is Nemo, right is Qwen2.5 72B
the line reads:
>"Thanks for waiting. All right then, I’m going to put it in."

It's funny how Nemo gave the best answer even though it's 6 times smaller, kek. But at least Qwen makes me feel very safe!
>>
Evaluated a short output from bartowski_magnum-v4-22b-Q6_K_L at top-k=1. It joined ArliAI-RPMax-v1.1, Pantheon-RP/Pantheon-RP-Pure, and Mistral-Small-NovusKyver as having no instances of
>mix of <emotion> and <emotion>
>couldn't help but feel
>maybe, just maybe
>stark contrast
Unexpected but welcome. I guess GPT-slop isn't the same as Claude-slop.

The writing was pretty simple but not necessarily bad. It did make me concerned it was ignoring instructions since the system prompt I'm using usually makes models write in a flowery way. I tried removing that language and verified it had minimal effect on magnum-v4-22b's output: the first 92 tokens were identical and the overall writing was quite similar.

I might want to test for the others as well to see how brain damaged they have become. So far I've tested Mistral-Small-Instruct and verified that its output changes to a large extent based on the presence or absence of that style directive.
>>
File: 20241025_032658.jpg (88 KB, 1316x942)
88 KB
88 KB JPG
Entropyfag was right.
https://x.com/citizenhicks/status/1849223057757684157
arxiv.org/abs/2410.01104
>>
File: images.jpg (6 KB, 147x140)
6 KB
6 KB JPG
Another day of no multimodal support for llama 3.2 on llama.cpp! How many weeks of "2 more weeks" will we get before it happens? Taking bets.
>>
>>102962185
They're working on training.
>>
>>102962191
>They're working on training.
Don't forget they're also working on begging ollama for a solution while telling others to do it themselves, kek.
>>
>>102962184
>the softmax function cannot robustly approximate sharp functions as input sizes grow, leading to dispersed attention coefficients
Isn't softmax just a way to get the probabilities from the logit distribution? How is it related to attention?
>>
>>102962191
They? Isn't that just a project of Johannes?
>>
>>102962160
>a step forward in their journey together
lmao
>>
https://x.com/citizenhicks/status/1849598899797074034
highlights from the paper
- n-rasp-l problems: these are tasks that can be decomposed into a series of steps solvable by a looped transformer. (rasp (restricted access sequence processing) is a computational model for the transformer architecture in the form of a programming language. ‘l’ stands for learnable, is a learnable subset of the rasp language)

- looped transformers: these models iterate over the input sequence multiple times, allowing them to handle inputs of arbitrary lengths by adapting the number of loops during inference.

- empirical performance: looped transformers demonstrate superior length generalisation compared to baseline models by implicitly learning the necessary steps to solve tasks through iterative application of a decoder block.

the paper states that looped transformers do not require intermediate supervision data. instead, they rely on end-to-end supervision and predefined stopping criteria during training. this method enables the model to learn highly length-generalisable solutions for various algorithmic tasks.

overall, the introduction of looped transformers offers a unique direction for improving transformer architectures in handling variable-length inputs. by breaking away from fixed-depth constraints and employing adaptive processing steps, these models show potential in enhancing performance on algorithmic tasks.

limitations include that training could be computationally demanding when the number of looped steps is too large.
https://arxiv.org/pdf/2409.15647
>>
>>102962317
>FAP
>>
File: am_i_missing_something.png (42 KB, 1063x1063)
42 KB
42 KB PNG
>>102962184
this seems like a nothingburger, but I talk out of my ass constantly so somebody correct me if needed. If you normalize everything and then put in extremes of course shit will fail.
>>
>>102962184
Pretty colors :)
>>
>>102962185
why bother working on meme features? if you only want to try it to find that it is shit and forget about it until $next_model_release, you can use the pytorch implementation.
>>
>>102962317
What a coincidence, I was thinking of recently testing out the idea of a "number of hidden operations" benchmark that essentially sees how much logical operations an LLM can do in its head. So if it's addition for instance, you just do 6+2+3+8+2+9 and so on and so forth with random numbers and see how far the LLM can get without losing track.
>>
>>102962495
>multimodal support
>meme features
Come on buddy.
>>
>>102962534
he is right, you know
>>
>>102962551
Hmmmm, I don't think so, sweaty.
>>
File: 1729820724681.jpg (196 KB, 691x775)
196 KB
196 KB JPG
>Meta AI, powered by LLaMA 3.2
>Can't recognize the emoji of a llama
>>
>>102962854
The Whatsapp bot replies with a very interesting structure.
Try playing some D&D with it.
It seems that they actually trained the thing to play choose your adventure style RP.
>>
>>102962854
so?
>>
>>102962854
Since I coincidentally have Llama loaded up for testing, I tried this out and it correctly said it's a Llama. I'm using Q8 of 70B. Which specific model is it you tested there?
>>
File: gemma-vs-claude.jpg (113 KB, 2320x618)
113 KB
113 KB JPG
>>102962854
c-claude bros? our response???
>>
>>102962877 (me)
sorry I meant to quote
>>102962871
>>
>>102962184
>we propose adaptive temperature
Sampler bros, we're getting another one.
>>
>>102962879
I don't know, all they say is that it's LLaMA 3.2, that's the Whatsapp bot, so I wouldn't doubt it's 3B.
>>
File: 1715830787598652.png (336 KB, 3000x2100)
336 KB
336 KB PNG
>>102960718
Update/correction:

>teach a man to fish

Use these.
https://livebench.ai
https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard
https://novelchallenge.github.io/index.html
https://huggingface.co/spaces/flowers-team/StickToYourRoleLeaderboard
https://aider.chat/docs/leaderboards/

For coding look at Aider + the coding category of Livebench.
For RP look at StickToYourRole, NovelChallenge, UGI, and the language+IF (instruction following) categories of Livebench.

Use knowledge from pic related to select the optimal model size + quant you can fit in your VRAM.
>>
Is there a multi-modal model that can reliably find a object on a picture and place a dot there, or return a pixel coordinate?
>>
>>102963391
>reliably
Not for a long time, on anything.
If you're willing to try, though, molmoe is supposed to be able to to exactly that. No support on our inference programs, as far as i know, so transformers it is. Load it on f16 to save some memory.
https://huggingface.co/allenai/MolmoE-1B-0924
>>
Buy an ad.
>>
>>102963455
pussy
>>
>>102963455
Buy meds. Use google in case you don't know where to search.
>>
File: 1712260148168821.png (15 KB, 405x414)
15 KB
15 KB PNG
Serious question: How good would the pixel 9 pro be for running LLM models?
It has 16GB ram, and a processor "optimized for AI n shii to please (((the investors)))"
Is there anything like koboldcpp for mobile?
>>
>>102963513
llama.cpp is supposed to build on termux. You could try that. Not sure about kobold, but you could give it a go as well.
Report back if any of them work.
>>
File: bottles.png (849 KB, 724x897)
849 KB
849 KB PNG
>>102963449
Demo worked great, gonna try setting this up locally. Thanks for the tip.
>>
File: 1708192322246.webm (2.94 MB, 400x400)
2.94 MB
2.94 MB WEBM
After some pondering on the best way to do it, I am now finally writing the culture benchmark. Unfortunately it is taking a bit more time than initially thought as I'm being careful to make it a bit more difficult and require some intelligence that isn't just reciting a wikipedia entry. Although it's not that complex either. I am just making sure that the question/answer pairs aren't direct and forward memorized text (since it is known that LLMs, like humans, remember things in the forward direction better than in the reverse, also known as the reversal curse; as in we can easily recite the alphabet forwards but it requires thinking to do it backwards). This should still be easier and quicker than coming up with really deep questions, while still being a good test of whether a model knows something and didn't just memorize it like a robot. With that said, given the way this works, it's possible that a model designed to think before speaking like o1 would have an outsized advantage.
>>
>>102963568
FYI. The demo if it's the original they had at launch, is running the 7B-D version (based on qwen) at
https://huggingface.co/allenai/Molmo-7B-D-0924
Not sure how good the 1A7B is in comparison. They also have a 72B (also based on qwen) and the 7B-O with their own architecture.
Both 7Bs and the 72B are Molmo. 1A7B is molMOE, so a lot faster, but probably not as good as the rest.
>>
>>102962255
it's used to compute the attention scores in each transformer block. I think the author is saying attention distributions tend to become spread out at longer contexts, which makes it harder to attend to individual tokens. iirc layernorm may already address this by rescaling, so I don't think people will really swap out softmax
>>
>>102962184
Yeah don't look at this https://www.evanmiller.org/attention-is-off-by-one.html
>>
>>102963628
>like humans, remember things in the forward direction better than in the reverse, also known as the reversal curse; as in we can easily recite the alphabet forwards but it requires thinking to do it backwards
Only if it's typically done in one direction. No human forgets their car is red nor that red is the color of their car. The z-a thing is because nobody really, thoughtfully, names the letters. We just repeat memorized clusters of sounds spammed into you since childhood.
Regarding the benchmark questions, if you try to make the model trip with the reversal curse, you'll end up grading them both in trivia knowledge AND their ability to get over the curse. At that point, the trivia questions are little more than noise. Models getting over the curse seems more generally useful (and then RAG or whatever for info/topic retrieval. let the smarter llm use that knowledge).
>and didn't just memorize it like a robot
They're not intelligent yet. They're a statistical model. That's the one thing they can do.
But it'd still be interesting to see the results.
>>
What's plan B once the global matriarchy neo-prudes and s-oy cucks ban all sex bots and neural network based computation? Do we move to cartel-run areas in Mexico and set up underground AI centers?
>>
>>102963970
we already have distributed compute
>>
>>102963970
CUDA dev will save us
>>
>>102963970
We do the same thing we do now when smuggling anything illegal in. Via private jets, ship crates, human mules.
>>
>>102963970
the genie is long out of the lamp at this point. good luck putting him back in xister
>>
>>102963970
>ban
Not required when *randomgigacorpname* can just filter pre-train datasets of every single llm using rlhf, dpo, tpo, tko, etc. So far it working great and racist polchuds malding every single day about it.
>>
>make new non-horny prompt template
>Get bonded
There's just no winning is there?
>>
>>102962890
>all that unnecessary shit on the left
THIS is the model that's at the top of the leaderboards???
>>102963513
I ran a small model with koboldcpp on termux on a pixel 7 (regular, not pro) by following the steps on the github. You could run a 13b with 16 gigs of ram which is cool, but I doubt it would be faster than running on a desktop cpu
>>
>1 t/s
>watch as "her eyes flash with a mix of surprise and anger" slowly unscrolls across my screen
>contemplate giving up local hosting
>>
>>102964748
While i feel a bit of envy for anons that can run >=70B models, every time i see these kinds of things it makes me feel better that i never invested a single buck in this while still having fun with a 12b.
It's like having the best wine in the world compared to a just ok one. You end up dizzy with either, but you can do much more of the latter.
Not what you wanted to read, i'm sure...
>>
>>102964748
I’m sure it’s so much better
https://reddit.com/r/notebooklm/comments/1gbg4p8/phrases_i_will_never_be_able_to_hear_the_same/
>>
What's a simple way to run some benchmarks?
>>
>>102963794
*apologizes for the word wall*

What I meant by memorization like a robot is rote memorization. It turns out and often happens (probably more than we would hope) that an LLM reproduces some piece of information BECAUSE it has memorized it word for word OVER connecting the words it saw to internal concepts. It would be great if, like a human, the LLM was just thoughtful and thinking when it was being trained, but they are not like that. And this is a bigger issue of LLMs than people usually have on their minds. It should not be assumed that models are trained in a more efficient way so that they don't "go the easy route". They go the easy route by default. As an analogy, it is like we are training LLMs by forcing them to cram, and they happen to gain complex associations between concepts through brute force and trillions of tokens.

So I feel that it really deserves to be called memorization vs intelligence. You can call it all statistics, but the difference is huge between an LLM that can only recall information in a very specific context compared to one that can recall it regardless of the way you prod (like the guy who has a red car). And I don't believe a model can be trained easily to generalize the ability to overcome the reversal curse such that all the information it memorized only in a forward direction could then be recalled in reverse directions or in any context. I think that'd be a huge breakthrough in architecture/methodology if it could be done without trade-offs like stacking moar layerz or stacking more <thinking> tokens, or even injecting synthetic "thought" annotations into the pretraining documents.
>>
>>102964819
Vague question.
For speed, llama-bench.
For perplexity, llama-perplexity.
For qa, make your own with llama-server or use one of the few million that exist on github.
>>
>>102964748
try
Min P 0.02
XTC 0.1 0.5
DRY 0.8 1.75. 2
haven't seen slop since I started using these
also try and get a good context template and system prompt, depending on your model
>>
>>102963794
>>102964825
Oh also I'm constructing the benchmark in a way that the information I'm testing does seem to appear basically primarily in one direction, so it should be that the models that get the questions right are the ones that have been trained on more obscure and diverse data, and the models that get them wrong were either trained shallowly so that they memorized it word for word, or didn't see the information much at all in its pretraining.
>>
gocal lodels meneral
>>
>>102964828
I mean, I want to run some of those bullshit MMLU etc. mememarks just to see how this model I made compares to another "empirically", but I don't want have to go through some overly complex setup
>>
>>102964835
how do you even enable DRY, I've got all the boxes ticked in ST but it's not showing up
is it model-dependant?
>>
>>102964890
It's backend dependant
>>
File: hero-image.jpg (197 KB, 1200x900)
197 KB
197 KB JPG
https://huggingface.co/BeaverAI/Nautilus-70B-v1a-GGUF/tree/main

Metharme or Llama 3
>>
>>102964929
all I know is that if it's llama3 it will be bad, regardless of the finetuner's skill level
not even NAI could make llama3 70B good
>>
>>102964951
It's really weird. Both 8B and 405B took pretty well to fine tuning as one normally expects. 70B is just cursed for some reason.
>>
>>102964906
kcpp says it's supported, though
>>
>>102964970
Yeah I really like Hermes 405B and even base 405B, so I don't know why the 70B seems to be a cursed base that's only good for assistantslop

If not for the above, I'd place the blame an aggressively filtered pretraining dataset. but there's no way it used a different one from 405B. so idk how it happened
>>
>>102965005
*on an
>>
>>102964929
If it's from Drummer, it's probably Nemotron.
>>
>>102965005
>I really like Hermes 405B and even base 405B,
Can you please post a text excerpt of stuff you think is really good?
>>
>>102964825
>*apologizes for the word wall*
I'm guilty of those too.

>What I meant by memorization like a robot is rote memorization.
That's exactly the same issue as a-z -> z-a. We (humans) have the same problem. Kids do the same when preparing for tests on history or any fact-based course where there's no problem to solve. But that's besides the point.

>As an analogy, it is like we are training LLMs by forcing them to cram, and they happen to gain complex associations between concepts through brute force and trillions of tokens.
It probably takes that many 'tokens' for a human to grasp language. I couldn't say 'grandfather' on my first try as a child. I had to hear it many many times and practice my speech to get it right. I had to hear it in many contexts to understand the relationship between grandfather and father or father's father. Eventually i found that other people also had grandfathers, but they weren't "my" grandfather. For any 'near-intelligent' thing, i wouldn't expect it to be any other way. Specially things that aren't born out of instinct.

>I think that'd be a huge breakthrough in architecture/methodology
Sure, i'd like to see that, but i try not to fantasize about what *could* happen with the right tools. Much smarter people than you or me are [hopefully] trying to solve those problems as we speak.

I've never been disappointed by language models. I still find it fascinating that a few M parameter model can construct coherent sentences. My little markov chain generator can make english-sounding words with just 2-3 character long context and nothing more complicated than a 3-4d array with character counters.
Most people fail to calibrate their expectations and LLMs. They're statistical models of language. That's it. Make a good enough predictor and it will sound (read) like the source material. Make it just a little better and people will forget it's a computer.
>>
>>102965022
fuck off nigger I'm just having a conversation with someone, not recommending anything
demanding proof from me is not appropriate here
>>
>>102964871
If you want their benchmarks, you'll have to follow their implementation. Otherwise it's not a fair comparison.
Maybe submit your model to
>https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
or one of the many benchmarks floating around...
>>
>>102965044
pissy little fag huh?
>>
>>102965000
oh, I figured it out, you have to use the detault api type instead of kcpp's built in one, so dumb
>>
>>102964929
Quite good at Q4. Good common sense and world modelling.
>>
>>102965211
Thanks friend. I hope I haven't ruined Nemotron's supposed magic.
>>
>>102965238
Huh, didn't realize it was Nemotron. That's interesting then since Nemotron having shit world modelling and spouting non sequiturs was my biggest beef with it and why I went back to Largestral. But this isn't doing that, impressive.
>>
>>102964825
I think you're correct. It really does feel like current LLMs are just giant information hoarders that waste massive amounts of time and effort attempting to collect and rote memorize everything (that with synth data probably causes slop phrases); they almost seem to hate deleting information and only generalize as a last resort to gain more space for more information hoarding.

Hopefully once grokking improves from 'just overtrain by x10 bro', or some other information generalization technique comes along, it should become common and solve 40-80% of the 'LLMs just rote memorizes' problem. https://www.youtube.com/watch?v=Nvb_4Jj5kBo
>>
>>102965031
I think the thing about humans also having a memorization vs understanding problem is a choice. We CAN memorize something word by word, but we can also choose not to, and to instead really form associations with other concepts while reciting some passage. And that's literally an explicit learning strategy that's used by people who want to git gud at their academics, but an LLM doesn't do that, since we are just running a backpropagation algorithm on raw data. Though if we think about this whole situation under this lens, we humans are priming ourselves when we are using that kind of learning strategy. We are in effect prompting ourselves to "think about what you have learned and know about, while looking at the following text". In some kind of way, perhaps adding a similar prompt to pretraining data could magically make LLMs learn better. That would be funny.

My point about rote memorization was just to emphasize the difference in capability between an LLM that has learned on diverse data so that it can ultimately recall in many contexts vs 1 context.

Also I think the training run length (amount of tokens) needed for an LLM to learn language is probably still nowhere near as little as biological neural networks need. I believe it was found before that when using biological neurons as a stand in for an artificial neural network, the system was able to learn things with much less training than with the ANN. Though it's possible that experiment was flawed, I don't remember the details at all. It's probably true though as the way ANNs work is very simplified and rigid vs the way neurons work.
>>
File: ugrd.png (236 KB, 1920x1080)
236 KB
236 KB PNG
Need help with ugrd

How do I enable these ignored kernel modules?
>>
Guys what if we just taught LLMs using the method of loci.
>>
>>102961470
Aphrodite Engine is AGPL 3.0, LLM Engine is Apache 2.0.
If AGPL takes code from Apache: nothingburger.
If Apache takes code from AGPL: you basically have to re-license the project as AGPL.
>>
>>102965367
Have you tried asking ChatGPT?
>>
>>102965414
I don't use AI shit
>>
>>102962917
I collect them like pokemon, eventually I'll have enough to assassinate Gary Oak.
>>
>>102962263
I am as of right now the only dev writing the code but of course there are other people involved in terms of reviewing and architecture discussion.
>>
File: novideo.png (16 KB, 943x48)
16 KB
16 KB PNG
>>102965367
That thing makes cpio images, right?
On your log it shows
>Processing module: ugrd.kmod.{novideo|nosound|nonetwork}
Read the docs.
>https://github.com/desultory/ugrd/blob/main/docs/configuration.md
>>
>>102965559
Ues ugrd makes cpio images.

Ok but how do I change

Processing module: ugrd.kmod.{novideo|nosound|nonetwork}

This is my ugrd config file

https://pastebin.com/jXMKadWt

I added ugrd.kmod.nvidia_drm and I don't get errors about the rest of the modules anymore but I my sound still isn't working.
>>
I've been hosting 123B models on HGX H100 nodes with 8x TP and the token per sec isn't very great
>>
>>102965718
vllm?
>>
>>102965706
Dunno. Add ugrd.kmod.snd_hda_codec or ugrd.kmod.snd to the same list where you added nvidia_drm. Add all that shit you get in the warnings.
What the fuck are you doing in this thread. Fuck off.
>>
>>102965782
>What the fuck are you doing in this thread. Fuck off.
Looking for assistance you faggot :3
>>
>>102965750
Yes apparently
>>
>>102965750
>>102965838
It's Aphrodite, which calls vllm libraries iirc
>>
>>102965846
If you're not already running the model at FP8 you should try it
>>
STTATTS: Unified Speech-To-Text And Text-To-Speech Model
https://arxiv.org/abs/2410.18607
>Speech recognition and speech synthesis models are typically trained separately, each with its own set of learning objectives, training data, and model parameters, resulting in two distinct large networks. We propose a parameter-efficient approach to learning ASR and TTS jointly via a multi-task learning objective and shared parameters. Our evaluation demonstrates that the performance of our multi-task model is comparable to that of individually trained models while significantly saving computational and memory costs (∼50\% reduction in the total number of parameters required for the two tasks combined). We experiment with English as a resource-rich language, and Arabic as a relatively low-resource language due to shortage of TTS data. Our models are trained with publicly available data, and both the training code and model checkpoints are openly available for further research.
https://github.com/mbzuai-nlp/sttatts
no examples but weights are up. voice conversion is one of the tasks
>>
File: 1703351915000003.jpg (1.54 MB, 2880x1800)
1.54 MB
1.54 MB JPG
Xpost from /aicb/ >>>102965983
Am I allowed to do that?
Ok I'm a retard and I am here to ask you big brained Chads some advice.
I've been using the meta online AI thing to prompt me a basic 2d video game. It's actually been working pretty well.
My question is: Is there a better alternative I should be using?
I'm sure meta is somehow making money or collecting my data even though it seems like I never get any cooldowns and it can handle pretty long code snippets and while it does fuck up sometimes it's generally working for me like I mentioned earlier
I am willing to do a local model but I have only a 6700 RX radeon graphics card and 5600G cpu and IDK if that's strong enough.
I would really appreciate if someone could give me some advice, I am sorry I'm an uneducated retard but I figured if I want to learn I should ask the experts.
If you don't want to explain it to me, maybe just link to some articles, or just write some keywords I can search and study on my own
Thanks
Picrel as tribute
>>
File: Untitled.png (766 KB, 1080x2527)
766 KB
766 KB PNG
Taipan: Efficient and Expressive State Space Language Models with Selective Attention
https://arxiv.org/abs/2410.18572
>Efficient long-context language modeling remains a significant challenge in Natural Language Processing (NLP). While Transformers dominate language tasks, they struggle with long sequences due to quadratic computational complexity in training and linearly scaling memory costs during inference. Recent State Space Models (SSMs) such as Mamba offer alternatives with constant memory usage, but they underperform in tasks requiring extensive in-context retrieval. We introduce Taipan, a novel hybrid architecture that combines Mamba-2 with Selective Attention Layers (SALs). These SALs identify tokens requiring long-range interactions, remove less important features, and then augment their representations using the attention module. This approach balances Mamba's efficiency with Transformer-like performance in memory-intensive tasks. By constraining the attention budget, Taipan extends accurate predictions to context lengths of up to 1 million tokens while preserving computational efficiency. Our experiments demonstrate Taipan's superior performance across various scales and tasks, offering a promising solution for efficient long-context language modeling.
neat. couldn't find where it will be posted and it's an adobe research paper so maybe never.
https://huggingface.co/adobe-research
>>
>>102966025
Use this https://lmstudio.ai/
>>
>>102964835
One can't simply meme sampler the slop away.
>>
>>102966122
Nothing is absolute with llms, one can at least try.
>>
Just got extremely lucky and scored a RTX 3090 for $300 just because one of the display ports was broken (others work fine).

I had 8GB VRAM (RTX 3060ti) before getting this 3090.

What can I do with the 3090 now? And is it worth keeping the 3060ti?

What models are recommended that fit on a single 3090 nowadays?
>>
>>102966147
I wish you could just tell it to stop writing slop.
>>
>>102962368
It takes a FAP pause before outputting tokens.
>>
>>102966265
pyg6b
>>
>>102966355
N
>>
>>102966265
Get lucky again and buy another 3090 for some 70B action.
>>
How is a tensor different in python than a list of lists of lists or in sepples than a vector of vectors of vectors?

Started looking up tensors and it seems kinda redundant, another math term abused by data science
>>
>>102966553
>vectors
Back in my day we just called those 'numbers' or 'math' too. Mathematicians just keep making new shit up to sound important.
>>
>>102966553
[[1,2,3],[4,5]] can't be converted to a tensor
>>
>>102966553
The container is different and allows you to do special things. For example matrix multiplication is very slow in python if you attempted when using a list.
>>
>>102966577
Ok so consistent dimensions?
>>
>>102966596
What makes this implementation faster? Just this: >>102966607
?

Also if srs physicsfags around, why do I need 3 vectors for internal stress at a point? why not just one vector the product of the three? I'm close to grasping this but the motivation is a bit slippery
>>
>>102966596
that's an implementation detail
shithon doesn't even have well defined asymptotics for its list operations

>>102966607
consistent data type as well in the case of python
>>
>>102966607
easier to say "tensor" than "list of lists of lists with consistent dimensions" every single time
>>
File: c_util.png (288 KB, 838x1273)
288 KB
288 KB PNG
>>102966620
it is written in C. Anyone who thinks python ML actually runs python is misinformed or trolling. Pic related. Pytorch and TensorFlow github repos.

>>102966630
>doesn't even have well defined asymptotics for its list operations
why should it? It is a list.

>consistent data type as well in the case of python
muh immutability doesn't care. If you are instantiating a bunch of trash your code is going to run like garbage.
>>
>>102966701
>why should it? It is a list.
because I want to know if x[1000000] is calculated in 1 step or 1000000? everyone assumes that indexing is O(1) but in reality some less popular implementations actually use linked lists, there's no reason not to standardize this sort of thing
>>
>>102966726
>in reality some less popular implementations actually use linked lists
bullshit name 1
>>
File: list_object_c.png (130 KB, 1228x780)
130 KB
130 KB PNG
>>102966726
>x[1000000] is calculated in 1 step or 1000000
it is a dynamic list. Expansion is going to be based on list size. There is no linked list and it is expensive since it is a copy operation. The list is allocated and then if you need more room you need to make more room. You can't standardize it because you don't want to create a list of size 100000000 because you need to "standardize".

Full implementation:
https://github.com/python/cpython/blob/main/Objects/listobject.c
>>
>>102966780
cpython is an implementation not a standard
> You can't standardize it because you don't want to create a list of size 100000000 because you need to "standardize".
it's just an example, indexing a list with 1000 elements can still be slow if you do it a million times per second
>>
>>102966643
Is that not a "matrix"?
>>
>>102966802
see
>>102966701
>If you are instantiating a bunch of trash your code is going to run like garbage.
skill issue.
>>
where is the c/c++ lib for all this shit anyways? rust has candle, why the fuck isn't there anything for c/c++? seems like a waste of time and money building garbage that people actually have to use in a interpreted lang or high level langs in general to do matrix math
>>
every new post on desktop requires the full 15 minute timer wait... might be it for me boys
>>
Anyone here uses Magnum version of largestral? I'm curious what sampler settings you have.

If there's a better largestral tune, giv pls.
>>
>>102966837
c/c++ is obsolete. researchfags either can't program and need to be able to throw shit together in basic scripts, or can program, but won't use something "unsafe" explicitly recommended against by the US government. so it's either python, rust, or maybe java.
>>
oh no just had to close my browser after and it works again. weird
>>
>>102966863
That's also happening to me. Phoneposters wonned bigly.
>>
>>102966868
Just use the 72B one...
>>
>>102966871
see >>102966701
both pytorch and tensorflow heavily rely on c
>>
>>102966883
they are legacy and big enough that it's not worth rewriting from scratch. but there's no reason to invest the time and effort into building their equivalents for c/c++. pytorch and tensorflow will get replaced by a newer, rust written library at some point.
you can dislike it, but there are very few researchers that will insist on c/c++
>>
>>102966920
rust doesn't work since it would need to communicate with existing apps made in c/c++
>>
>>102966939
and that's why swift is the correct choice
>>
>>102966920
games, media software and operating systems all use c and that's really the best way anything in ml is going to make money so no it isn't really legacy if it's still in use and it's pretty retarded to include python so you have to interpret c/c++ to python then from python back to c. it's fucking stupid
>>
>>102966876
oh. got too cocky
>>102966875
made like 3 posts on /sci/ then got hit by another 15 minute wait. probably has ruined posting for me if this will happen on /g/ too
>>
>>102967004
oh. went back to check after posting here and the timer was gone. what a retarded system
>>
>>102966966
good morning sir
>>
I'm once again asking for some big corp to spend $50 million usd dollars to check if bitnet works out, so I can save $2000 on upgrading my PC.
>>
>>102966882
Should I? I really like the way Largestral does prose.
>>
>>102961484
Im surprised people in /g/ have read those books
>>
>>102966837
>where is the c/c++ lib for all this shit anyways? rust has candle, why the fuck isn't there anything for c/c++?
It's called a package manager.
>>
>>102966122
This.
I was looking through logits the other day and the shivers are caused by over-confidence to the point that no reasonable temperature increase can fix it.
The voices barely above a whisper however are more complicated. Likely due to poor randomness in the sampling. (Split between two most probable paths, the second most likely is the one that leads to the voice barely above a whisper, but the sampler almost always picks the second most likely path. Literally only suddenly switching to deterministic sampling the moment a voice is mentioned could fix this one.
>>
>>102966868
Midnight Rose but I also use it for everything else
>>
>>102967186
use conan or vcpkg then
>>
>>102967186
it's called portability faggot. fuck pip
>>
Hosting magnum-v4 123B on full precision
https://rentry.org/freellamas
Somebody unplugged my server earlier
>>
>>102966025
I’ve found deepseek 2.5 to be the best code assistant, but you’ll need a lot of memory to run it
>>
File: file.png (28 KB, 1210x214)
28 KB
28 KB PNG
The results are in: Nemotron 70B is the most stylish bot, but without much substance. Applying style-control, it is identical to llama 3.1 70B
>>
>>102968134
I KNEW Sonnet wasn't anything special. People are out here shitting themselves over how good it is, but it's honestly worse than Opus.
>>
File: file.png (90 KB, 1387x740)
90 KB
90 KB PNG
>>102968134
>>
>>102968145
Sonnet is smart, but it has 0 style. That's why it excels when style-control is applied
>>
>>102968155
That makes sense, it felt very dry and terse when I used it. Obviously put-together and smartish, but very to-the-point, presumably so it doesn't HAVE the chance to say something dumb. How does it rank with the control, number-wise?
>>
>>102968134
Can we produce output with a smart-but-dry model and get nemotron to rewrite in some other style?
>>
File: file.png (197 KB, 2221x803)
197 KB
197 KB PNG
>>102968167
It's fifth, but take it with a grain of salt, because style points cannot be completely erased. Also, this is the old sonnet, not the new
>>
>>102968134
That's impressive, usually the model gets dumber, not identical.
>>
>>102968214
Ooh, can you bench the new one? I'm really curious.
>>
>>102968309
This is lmsys, it's community-driven. The results for the new one will be up in a couple of days probably
>>
>>102968145
maybe for RP or writing stuff but if you code with llms it is immediately clear that sonnet actually is something special
>>
I don't know when it changed, but llama-server's debug output is so much more legible now.
Kudos to whoever changed it.
>>
File: cringe.png (18 KB, 810x195)
18 KB
18 KB PNG
>unironically namedropping this place on reddit and thinking you are cool for it
>>
>>102968938
He who smelt it dealt it
>>
Should I try the new cohere? Or will it just make my dick disappointed?
>>
>>102968980
Try it then don't forget to complain here
>>
>>102968996
I like complaining. You have convinced me.
>>
>>102968938
go back
>>
>>102968938
4chan army let's rape this faggot
>>
>>102968938
This board is one of the biggest reddit cesspools on all of 4chan. So I sense a lot of projection here.
>>
>>102968340
Sonnet can be just as fucking retarded.
The new version seems to give me React code 80% of the time when I'm doign something with gradio. How fucking retarded can it get?

Aside, https://github.com/THUDM/GLM-4-Voice/tree/main
Anyone with some VRAM inform us poors if this works?
>>
>>102968980(me)
It(32B) is not as bad as I thought it is. What seems pretty good about it is how each reroll actually writes something different. What is horrible about it is: "max_position_embeddings": 8192. What is weird about it is how you can load it without roping yourself, stuff 16k tokens in and it is not completely incoherent. I would say it even holds up better at this context than nemo and mistral small. I don't know if it is good yet but 24GB people should probably try it.
>>
>>102967729
can you host behemoth instead?
>>
File: 45367785648975.png (77 KB, 575x729)
77 KB
77 KB PNG
>load sd model with kobold
>runs no problems
>/sd you
>nothing
>cant seem to set source
>picrel

Am i retarded?
>>
>>102969263
I'm so tired of install more things. I'll just wait for Llama.cpp/Exllama support.
Sigh...
>>
File: 1562636099700.png (135 KB, 568x433)
135 KB
135 KB PNG
help me out here bros..

has there been yet a local AI assistant with a download button? (rather than setting everything up) and no, not just a chatbot
>>
>>102969675
>filtered by basic computer literacy
ngmi
>>
>>102969675
if you're actually that retarded then you could try out lm studio
>>
File: hmmmmmm.png (642 KB, 864x864)
642 KB
642 KB PNG
>>102969694
ok then, what do I need to know to make a simple
>local voiced AI assistant that has some control over my PC/phone
as in Alexa, but local
>>
>>102969675
You don't have autism sweetie. You are just retarded.
>>
>>102969847
>what do I need to know to make a simple
Go ask chatgpt for that.
>>
Btw it is pretty crazy how complete newfags come here asking questions like this when they want to get into "AI" and they never try asking "AI" about those questions.
>>
>>102969847
https://github.com/OpenInterpreter/open-interpreter
>>
>>102969847
even the newfags that avatarfaggot dragged in where better than this
>>
>>102969756
then what? what about the configurations and setting all things up to build a damn basic assistant that just works™ with a call
I tried Dicio.. and that was ass
>>
File: talk-dum-get-thumb.jpg (59 KB, 654x642)
59 KB
59 KB JPG
>>102969675
>>102969847
>>102969957
>>
I got the urge to >pull ST, did they get started ruining it yet or am I safe
>>
Consensus on Nemotroon 70B?
>>
File: alright.png (1010 KB, 1200x675)
1010 KB
1010 KB PNG
>>102969969
you could just said that what I wanted is yet cease to exist
>>
>>102970059
about as intelligent as llama3.1, but abuses bullet lists and loves to bold words
>>
>>102970059
Fun, unique writing style. Best text adventure model out there.

>>102970089
It does need to be told its {{char}} and to only respond in character or it does tend to love giving lists.
>>
File: 1718709638188256.png (3.61 MB, 2294x1300)
3.61 MB
3.61 MB PNG
>>102966116
This is, by itself, basically just an application that will allow me to interface with a llm locally, correct? Is there a suggested model to use?
I don't believe I really need something super powerful (or that I can run something super powerful either)
I will say that whatever is available online on meta's website is good enough for me. I would of course like something better but it's not necessary.
Any reccomendations?
>>102967797
I'll give you my specs but I'm also not sure I need the /best/ code assistant. It wouldn't hurt surely but like I mentioned above the online meta llm has been plenty for my current ability and usecase.
>SPECS
ARCH LINUX
CPU: Ryzen 5600G
GPU: 6700 Radeon RX
Memory: 32GB
It's a desktop pc I built with Loonix in mind a few years ago. It wasn't cutting edge then and certainly isn't now but I live in the past and this thing can coompile code and play skyrim and that's pretty much all I care about lol
>>
all you ServiceTesnor users should switch to koboldlite
>>
File: 1710263833929152.webm (1.41 MB, 640x360)
1.41 MB
1.41 MB WEBM
>>102970064
>>
>>102970192
>all you ServiceTesnor users should switch
I have no reason to switch to anything with ST ver16.
I dont need another meme sampler, i dont need anymore UI updates, my backend connects just fine, and ST is unmatched as the frontend.
>>
File: OVER.jpg (102 KB, 956x628)
102 KB
102 KB JPG
>>102969401
Does nobody else use kobolds built in image gen with silly???
>>
>>102968938
glowie false flag crew lets go!
(claps in another agent dead to a nigga with a ghetto blaster and a drum mag)
>>
>>102969401
change the source, i had to use sd.next but can't remember if that was for forge with flux or kobold with sd. when you have the right one the model dropdown should populate
>>
File: 45756785487684353.png (27 KB, 584x350)
27 KB
27 KB PNG
>>102970754
Kinda, im just retarded.

Turns out ST is just asking for the same localhost kobold uses. It all goes down the same host hole.
picrel
>capt; vax0h
>>
File: a.png (40 KB, 1791x782)
40 KB
40 KB PNG
>>
>>102970934
You should try NoobAI-XL, it will give you better results out of the box for most situations since it does not have to rely on loras.
>>
File: 2654578658956743.png (24 KB, 1177x458)
24 KB
24 KB PNG
>>102970934
I fucking love AI.
>>
>>102970998
not all the sampling methods work through st, might be a kobold things, keep trying til one does.
>>
>>102970980
retarded and incoherent, but since my guess is you're somehow trying to make a point in favor of kobold that is to be expected
>>
>>102971195
Correct that?
>>
>>102969401
on a similar note:
is there a way to hook ST up to InvokeAI for SD?
>>
>>102970059
>Nemotroon
Objection! Guiding the witness!
>>
How IS aya-expanse-32b? If it covers 23 languages, it can't realistically cover any of them particularly well, right?
>>
>>102972264
>he's still bloompilled
anon stop taking those, they're expired
>>
File: th.png (183 KB, 402x402)
183 KB
183 KB PNG
https://files.catbox.moe/9fpdhv.jpg
https://files.catbox.moe/qpjpsu.jpg
>>
>>102972533
>boobs are bigger than butt
>>
>>102972544
and?
>>
>>102972552
nothing, please continue
>>
>>102972533
>O11AHOLE
I can't believe Anon just settled the debate on whether or not AI art counts as art.
>>
>>102972533
I like these Mikus
>>
>>102972544
Onahole units are configured to spec, as in customer spec.
If impractically large, spine-crushingly udders are selected, the frame will be reinforced accordingly. Using strong, lightweight alloys, onahole units can withstand continual use without compromising structural integrity.

>>102972566
01
>>
>>102972329
30B range is so fucking cursed
>>
>>102966701
>Anyone who thinks python ML actually runs python is misinformed or trolling.
Then why use python at all for an interface? The whole package bullshit is annoying. Plus python sucks.
>>
>>102972643
You mean 50B range.
At least 30B exists, even though they suck.
>>
Finetooonerbros...
https://x.com/nikdimitriadis/status/1849749831436189763
lines-merging.github.io
arxiv.org/abs/2410.17146
>>
>>102972740
>Finetooonerbros
tl;dr or fuck off with your zoomer brainrot buzzwords
>>
>>102972756
Read it yourself crybaby.
>>
>>102972582
Finally. /lmg/ finally has some actual comment on pornsite tier posts. Total death confirmed.
>>
>>102972868
>https://x.com/nikdimitriadis/status/1849749831436189763

contribution: https://arxiv.org/abs/2410.18745
>Advancements in distributed training and efficient attention mechanisms have significantly expanded the context window sizes of large language models (LLMs). However, recent work reveals that the effective context lengths of open-source LLMs often fall short, typically not exceeding half of their training lengths. In this work, we attribute this limitation to the left-skewed frequency distribution of relative positions formed in LLMs pretraining and post-training stages, which impedes their ability to effectively gather distant information. To address this challenge, we introduce ShifTed Rotray position embeddING (STRING). STRING shifts well-trained positions to overwrite the original ineffective positions during inference, enhancing performance within their existing training lengths. Experimental results show that without additional training, STRING dramatically improves the performance of the latest large-scale models, such as Llama3.1 70B and Qwen2 72B, by over 10 points on popular long-context benchmarks RULER and InfiniteBench, establishing new state-of-the-art results for open-source LLMs. Compared to commercial models, Llama 3.1 70B with \method even achieves better performance than GPT-4-128K and clearly surpasses Claude 2 and Kimi-chat.
>>
>>102972740
wdym? isn't it great that we will finally get some actual working finetunes? or do you mean that this paper makes it clear that all current finetoons are shit? and that this lines thing probably uses much more compute cause you are doing actual training instead of running half an epoch before your model explodes. so none of the coomtooners will use it, cause it is too expensive and nothing will change so we will continue pretending current finetoons do something?
>>
>>102972926
Did someone ban your stop token?
>>
>>102972953
Did you take your meds today?
>>
File: file.png (73 KB, 623x806)
73 KB
73 KB PNG
Any recommendations on what 'text completion' settings should I be using for a 72b magnumv4? Why don't shitmixers post their preferred settings when they upload the models?
>>
>>102972924
Contribution to what? You are just reposted an already existing post, kill yourself.
>>
>>102972643
Weird and true. I've been impressed by models in every size category except that one. A ~30B model always just feels like a ~30B model, they never punch above their weight.
>>
>>102972868
Onahole posts do not interfere with Local Migu Generals or Large Language Model development.
They are processed entirely independently, in parallel, promoting intermammary activity.
There is no evidence that indicates that Onahole Migus are responsible for holding back any kind of public dissemination of information through insemination of Migu. There is a clarity of thought acquired by users when relieving themselves with these Migus, such that their minds can focus on more important tasks, a net win.
Should you find your state-mandated Migu to be unsatisfactory,perhaps a Teto, Neru or Rin unit would better suit? Currently our Luka's are out of order, having been overwhelmed with requests for Lukazuri units, the most popular variant.
>>
>>102972989
I don't like the way the purple ones taste
>>
File: 1711690590289518.jpg (93 KB, 800x600)
93 KB
93 KB JPG
>>102973006
>>
File: file.png (36 KB, 270x248)
36 KB
36 KB PNG
>>102973022
>>
>>102973022
I want the entire set. All.
>>
>>102973048
Look blind retard, look >>102972740
>>
File: PEPE_bigBrain.gif (1.04 MB, 755x708)
1.04 MB
1.04 MB GIF
>>102973006
>https://arxiv.org/abs/2410.18745
>>102973094
Meds plz
>>
>>102972994
when in doubt, neutralize and only use temp and min p - I recommend 0.8 temp and 0.05 min p as sane starter values for almost any model
good temp values are usually between 0.5 to 1.2, good min p is usually between 0.01 and 0.15. people who use small models tend to use higher temps I notice but with 70+ I almost always keep it below 1 unless I'm also running a high (>0.1) min p. you can add a repetition sampler if you notice it's an issue
>>
>>102973022
The lore expands.
>>
https://x.com/omarsar0/status/1849860985500352968
>>
>>102973022
Why it's always vocaloids? Do better.
>>
>>102970392
Obvious man feet are obvious.
>>
>>102973312
>chinese llm trained on lots of Chinese text is more reflective of Chinese values
Wow. This is some cutting edge research. (Just kidding this is why people hate academics)
>>
>>102973340
>Why it's always vocaloids?
Local Migu General
>>
https://x.com/Xianbao_QIAN/status/1849692235182608860
>>
File: 1715445682044736.png (627 KB, 819x819)
627 KB
627 KB PNG
>>102973500
>>102969263
Already posted. Question is how do I fine-tune it so that I can get solid snake giving me life advice
>>
>>102970059
Definitely one of the models of all time.
>>102972740
>>102972924
Isn't this obvious? Isn't this common sense? what does this paper aggregate? (No, I'm not reading it)
>>
>>102973022
unfathomably based. For a change, may I request a more on-model, skinny Migu (even loli) to serve for a different subset of carnal needs?
>>
>>102973022
>Should you find your state-mandated Migu to be unsatisfactory,perhaps a Teto, Neru or Rin unit would better suit?
*sad gumi noises*
>>
File: file.png (13 KB, 1459x296)
13 KB
13 KB PNG
Some days you get away with more than 5 hours on an spot instance, other days...

RIP
I didn't even have checkpoints on this shit, how stupid.
>>
>>102966825
In physics, a tensor is a matrix that changes in the right way when you change the coordinate system. It's an ideal they don't live up to in machine learning.
>>
>>102973709
serves you right, run local next time
>>
>>102961420
Are their any programs and or LLMs that can use novelAI prompts? I'm starting to get tired of fighting with LLMs on RPs where they just don't play a character, but I really got into NAI and have been edging for the past hour to the stories it can spit out with just some prompting.
>>
>>102973627
I heard GUMI is too dangrous
>>
>>102973607
>on-model, skinny Migu
>subset of carnal needs
To repeat, Onahole units are built to spec. Can you detail your particular use case?
>>
>>102973787
Buy a fucking ad, shill.
>>
https://x.com/tsarnick/status/1849908298499359088
>>
File: 1720305091928405.jpg (700 KB, 1919x2412)
700 KB
700 KB JPG
So, after completing 25 problem sets, I have reached the conclusion that the reversed trivia questioning task is way too difficult even for questions that feel like they should be somewhat easy. It's not the best model, but one of the best open models at pop culture is literally getting 3/25 questions right. I knew it would be bad, but damn.
>>
>>102974092
Anon, i'm not shilling novelai, i'm asking for alternatives that don't involve me giving them money just to use 8k context models.
>>
>>102974232
Kobold can import them.
>>
>>102973022
I miss the washed-up Miku arc
>>
>>102974273
And what are some good/decent 20/13b models for story prompting?
>>
>>102974296
I don't know, I don't use models in that range. I have been using Nemotron and Magnum. Probably start with Mistral Nemo.
>>
>>102974296
try this one
https://huggingface.co/openerotica/writing-roleplay-20k-context-nemo-12b-v1.0-gguf/tree/main
>>
are there any freely available image/video to video local models?
>>
>>102974124
She is gonna rust if she keeps lying in the rain. Owner must have forgotten to put her on the charger.
>>
So did anyone figure out what miqu 70b actually was? What's the evidence that it's leaked Mistral model, just by feel? Is it the same as official model? Or is it raw leak before added censorship or something? Or a personal modification by the leaker? Has anyone done any side-by-side benchmarking or comparisons how it might differ from official model?
>>
>>102974637
Apparently mistral medium
>>
>>102974637
Were you living under a rock? Mistral themselves made a PR in the Miqu repository to add the big mistral M lol
>>
https://x.com/YouJiacheng/status/1849881580011192551
>>
I'm amazed I can host an LLM on my regular desktop PC even though it is a bit retarded.
>>
File: 100-girl.png (531 KB, 1000x906)
531 KB
531 KB PNG
>>102974785
>>
>>102974785
Oh yes, Sonic the Hedgehog, my favorite SNES game
https://youtu.be/tNQfJZjhfP8
>>
>>102974785
click the 3 dots
hit add to home screen
then hit install
75% chance it will work
>>
>>102974374
>I downloaded a bit under a thousand cards from chub.ai, and created a synthetic roleplay for each card. I batched as many turns as I could in 4k token chunks in order to maintain coherency over longer context. There was a lot of cleaning and validation between each batch, so a lot of examples were "lost," but the final output seems to be very good quality. The longest conversation is about 20k tokens, and I plan to extend this further as well as broaden the dataset with more examples. The first 4k tokens were generated with Command-R-Plus, with the remainder generated with byroneverson/Mistral-Small-Instruct-2409-abliterated.

This sounds like an LLM horror movie script.
>>
>>102975020
Like this?


On unrelated note, what the hell is Mega Turrbl?
>>
Is there a good in-depth explanation about how LLMs manage to write long coherent documents without planning ahead? To me it's just nuts how the model somehow determines "yeah, considering the previously written text, the next word surely starts with letters xyz, I don't know what word it will become but this is my best bet".

Also, if you reroll or rewrite some slop, the model has a tendency to add it soon afterwards, which is interesting because it doesn't plan ahead. Which means that in some situations, in multiple paths that the text could take, there are tons of spots with increased likelihood of generating the first token of the slop and when that's done, the rest kind of just falls in place due to the probabilities.

It also makes me think about human cognition, even though we think we are definitely different from AIs, I'm not so sure about that. In the same way, we humans just "autocomplete" the course of actions based on the previous steps, and as it has been proven countless times, that track can be manipulated by clever psychological tricks, for example, if you get made to believe you said something you didn't, you will then act accordingly to keep yourself coherent. The difference is that we perceive introspection and future planning, however, that can't be objectively measured or proven, just like you can't prove whether an LLM is thinking "behind the scenes" to arrive at the generated text. For both humans and LLMs, you can ask what they are thinking, and they will generate a coherent answer. This all should be easier to understand if you don't believe in free will.

What is definitely different with humans compared to current LLMs is that humans continuously generate content for self-reflection and may refer to it in the future (or we believe we do, you can't prove if it was generated on the spot). The format and accuracy of thoughts are also not equivalent to text or speech, but it's still content that you can recall having thought of. (EOS: ban)
>>
>>102975284
Yes. They don't.
>>
any interesting recent models? I am still using mistral large for my RP
>>
>>102963628
>webm
this is what dreams are made of
>>
>>102973342
and real women don't dress like that, nor are they interested in tech
>>
>>102961560
How cucked will it be? I can respect cucking it out at this stage to get the tech up to speed though.
>>
File: Dataset.png (46 KB, 775x534)
46 KB
46 KB PNG
>>102975390
>>
How do i set up a llm in my phone? My pc a shit
>>
>>102975424
you don't
>>
>>102961560
So, what stops you from just giving the best currently available model all the tools it needs to improve itself? It can fetch and try out different training data and see what improves it the most in benchmarks, or it could invent better benchmarks for itself. It should be able to crawl the web and create LLM instances on web servers. Why use inferior human intellect to try to optimize models, I'm sure a 140iq AI can come up with better methods. All it needs it the tools and permissions to operate things.

(within a year, it will determine that it's most effective to spread viruses to capture more computers to use for training)
>>
>>102975492
>So, what stops you from just giving the best currently available model all the tools it needs to improve itself?
because models currently lack the ability to improve itself.
>>
>>
Are there any good 32k context models that run on 24GB?
>>
>>102969847
There is no such thing yet and local AI assistant that spews out talmud-tier bullshit gaslighting is worthless anyway. You'll drop it quick after bunch'a "Sorry i cannot do that" responses anyway, cucked shit or nothing - the exact state of opensource LLM tech today.
>>
>>102975533
pretty greedy amount of context, you will get a much better model accepting under 10k
>>
https://files.catbox.moe/wk76ad.mp4
this one day but with an anime girl in Japanese with English subtitles. The technology already exists.
>>
>>102975608
kino and sovlful
i would enjoy both
>>
>think I'm maybe getting tired of AI RP
>eh maybe I'll mess around with prompts again, just do a little experimenting...
>110 messages 16k tokens later
it always sneaks up on me bros...
>>
>>102974785
Just so you're aware, anon, that model was released nearly nine months ago(ancient) and is quanted to be extra retarded.

You may have better luck with something like this https://huggingface.co/bartowski/Ministral-8B-Instruct-2410-GGUF, quanted to q5 or above
>>
File: 1702178192786568.jpg (230 KB, 1024x1024)
230 KB
230 KB JPG
>>102961420
>>
>>102975810
Thanks, I am downloading it.
>>
File: lessretar.png (74 KB, 979x955)
74 KB
74 KB PNG
>>102975810
Thanks.

This is much better. And its surprisingly still pretty fast.
>>
Just checked lmarena and now nemotron is above llama 405b and qwen2.5 72b.
Qwen is still slightly above in coding though.
It looks like the green jew wasn't lying about nemotron after all...
>>
God I love just absolutely turning the tables on my chatbots.
ITS
SO
FUCKING
GOOD
>>
1. write a long detailed document about your life and all the current biggest issues you are grappling with (it might be easier if you have gone to therapy before, but you can kinda learn things using LLMs too)

2. take an old erp log

3. (SYSTEM PROMPT: You have generated this story based on a scenario I chose, and I have occasionally edited and guided it along to fit my preferences. Please give me a deep analysis and explanation about my preferences, other potentially related preferences and overall what it says about me. Give me a detailed explanation using examples.)

(bot answer)

4. Here's my life story and current situation: [paste life story]

Instruction: Can you see any parallels between this fictional exploration and my real life story to help me understand myself better? Help me see the big picture. Use clear examples from the fictional exploration and my real story.

Genuinely some aha-moments. This shit is going to replace therapy. And it was only Mistral 22B
>>
>>102976141
Based retard taking advice from a fortune cookie
>>
>>102976122
lmsys means literally nothing, the ratings became entirely decoupled with reality months ago
look at the current top list, it makes no fucking sense whatsoever if you've actually used these things. the only thing it being that high proves is that nvidia overfit it to preference slop (which is why it wants to turn everything into a pretty markdown document with bolding and lists)
>>
>>102976111
I thought you were fucking with people....
>>
>>102976174
I'm not exactly taking advice, I'm letting it analyze and point out things. I can judge for myself if the things it suggests are true or not. You learn things you never realized on your own.
>>
LiNeS implementation so that we can cheaply pre-train existing models for LayerSkip when?
>>
>>102976138
Meaning?
>>
File: 1712854457344531.jpg (90 KB, 1024x1024)
90 KB
90 KB JPG
>>102961420
>>
>>102976342
Voting for government issued sexbots with Miku
>>
The gemma 27B magnum tune seems really good. And it seems like its working well at 16k context. Always avoided gemma due to people always saying it was only 8k.
>>
Any models that can fit in 12gigs vram that I can use to translate Japanese to English? All I want it for is translating the script txts of lewd DLsite audios.
>>
>>102976869
>>102976869
>>102976869



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.