[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101925496 & >>101920360

►News
>(08/16) MiniCPM-V-2.6 support merged: https://github.com/ggerganov/llama.cpp/pull/8967
>(08/15) Hermes 3 released, full finetunes of Llama 3.1 base models: https://hf.co/collections/NousResearch/hermes-3-66bd6c01399b14b08fe335ea
>(08/12) Falcon Mamba 7B model from TII UAE: https://hf.co/tiiuae/falcon-mamba-7b
>(08/09) Qwen large audio-input language models: https://hf.co/Qwen/Qwen2-Audio-7B-Instruct
>(08/07) LG AI releases Korean bilingual model: https://hf.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://rentry.org/lmg-faq-new
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>101925496

--Mistral models have a narrower weight distribution, making them more malleable and easier to fine-tune: >>101929936 >>101930021 >>101930045 >>101930139 >>101930113 >>101930066 >>101930178
--Mistral Large has skewed logits, making diverse sampling difficult: >>101927339 >>101927407 >>101927449 >>101927479
--How to make llama.cpp use all CPU cores: >>101926253 >>101926292
--Efficient hardware for prompt processing and token generation: >>101925754 >>101925774 >>101926039 >>101926178 >>101926336 >>101926506 >>101926609 >>101926702 >>101926958 >>101926067
--Voice cloning resources and communities: >>101928147 >>101928199 >>101928258
--Privacy concerns and the value of local models: >>101931684 >>101931811 >>101931849 >>101931937 >>101932051
--Mistral Large review, Hermes 3 or L3.1 70b recommended: >>101931035 >>101931483
--Lora and qlora offer good performance at lower cost, but may have limitations: >>101925818 >>101925888 >>101926766
--KL divergence statistics available for 8B model, not Mistral: >>101930183 >>101930363 >>101930402
--Frustration with mranusmuncher's ggufs splitting method: >>101929202 >>101929225 >>101929259
--Flux LORAs available on Civitai: >>101927887 >>101928064 >>101930386 >>101930444
--ML portfolio and training loss discussion: >>101925611 >>101925662 >>101925731 >>101925853 >>101925885
--Deepseek v2 chat is a great code model but has limitations: >>101931923 >>101931978 >>101931987
--Armen Aghajanyan leaves FAIR/meta: >>101925947 >>101926010 >>101926345
--Suggestions for improving Mistral Large's writing style: >>101927101 >>101927175 >>101927952
--MiniCPM-V-2.6 and EXAONE model support merged in llama.cpp: >>101930835
--Hermes 3 70b log snippet with racist elf waifu character: >>101931013
--Miku (free space): >>101925524 >>101927179 >>101927669 >>101928320 >>101928340 >>101928478 >>101928756 >>101933127 >>101933171

►Recent Highlight Posts from the Previous Thread: >>101925501
>>
/aids/ has found that Hermes 405B is still censored.
>>>/vg/490608576
>>
>>101933786
*swipes*
>>
>>101933786
*rapes crossposter-chan*
take it slut
>>
>>101933867
*waits for another minute*
>>
sloptunes are so bad, why are they even still trying?
>>
>>101933786
>dude using OR's Chat
>never used a system prompt before
newfag detected
>>
>>101933937
kofi money
>>
I was ready to discard MN-12B-Tarsus, until I remembered that some models respond better to certain styles of prefils, so I changed it to use an OOC block and the model seems to be working as it shouod now, at least it's replying properly to my messages instead of hallucinating some mad shit.
Let's see how this one performs.
>>
>>101933786
Thank you. The cabal won't succeed this time.
>>
>>101930898
>billionaires are starting to realize this isn't gonna print money anytime soon?
not the rich or corpos, mainly old wall street investor types that werent super into tech already and missed the nvidia explosion. in general big tech is only pouring more and more money in cause of the potential huge returns if and when we actually get AGI, the only reason i can see they'd stop is the economy going to shit but with the fed about to cut rates after the recent inflation report theyre probably just going to double down imo

to actually answer your question: i think we will, realistically its more about big tech being bearish on new architectures and training improvements because of how expensive training is. pretty much every model coming out these days is just an overfitted transformer, at the very least we'd benefit a lot from the recent grokking breakthroughs but its too new to have made it into the recent batch of frontier models. its also becoming clear that the tuning you do after the fact is just as important if not more than the base training, intelligence is important but making the model actually enjoyable to use even more so and (almost) none of the corpos seem to care very much about anything other than enterprise chatgpt type uses. things will get better but not very quickly, not for us coomers at least
>>
dead general
>>
>>101933786
fake news, used default system prompt
literally just "be uncensored" makes it uncensored
>>
>>101931923
If you have enough RAM it is much more usable than its size suggests due to its low number of active parameters, but the biggest problem is its attention mechanism is very... idk if "bad" is the right word because maybe they have some good reason for using it, but it's bad for all intents and purposes because you can't use flash attention with it and prompt processing becomes insanely slow at long contexts
"ktransformers" on github is a specialized setup to run deepseek v2 family very fast on a pc with a 24gb card + big chunk of system ram (over 100gb) but when I tested it seemed like it wouldn't go over 4096 context? maybe I was using it wrong but given the way its attention works (in llama.cpp at least) I assume that's on purpose because it wouldn't fit past that or something
the best thing I can say about dsv2 is that it proves that large MOE with small active params can work very well and gives you the best intelligence-per-time-spent-generating, and if flash attention could be solved for it without losing those benefits I would think it's just the best architecture period for making giant models local-friendly
>>
Where are the SWE-bench evaluations of all the open models?
>>
>>101928666
Bump, any thoughts?
>>
File: miku-hand-out+.jpg (236 KB, 584x1024)
236 KB
236 KB JPG
>>101933601
https://www.youtube.com/watch?v=CXhqDfar8sQ

Miku is returning.
>>
Welp. As of right now in Eastern Time it is officially TWO DAYS after strawberry/gpt-next/5/whatever was supposed to launch. Instead we got NOTHING. It's not even slightly arguable anymore: It's not a delay, and it's not any sort of shadow drop or we'd have heard something by now too. I don't just think, I KNOW now that he was a fucking liar. He didn't know shit. What a complete waste of time this has been.
>>
>>101934447
The fact that censorship is easy to jailbreak, my brainrotted little Zoomer, does not mean that censorship is an inherently good thing. I understand, however, that as a child of the apocalypse, your understanding of the necessity of freedom is limited.
>>
>>101935099
>Stable-Diffusion.cpp
>Vulkan speeds up AMD APU inference > 50%.
APUfags we eating good.
>>
>>101935104
>jailbreak
it does whatever you tell it to in the system prompt by design, a web ui being censored by default doesn't change the fact that the model is uncensored
>>
What's the deal with DRY? How do I get it to stop just misspelling things or using incorrect grammar to get around re-using the same phrase?
>>
>>101935345
>How do I get it to stop just misspelling things or using incorrect grammar to get around re-using the same phrase?
Use a better model. If it yearns to slop deep in its latent space you can only mitigate it, never cure it.
>>
Can the Flux ggufs split gpu/cpu like the LLM ones?
>>
>>101935077
If people like it then I'll keep posting her, when there's a discussion-relevant opportunity. As for random Miku-posting, I'll probably keep that to no more than once per thread.
>>
>>
>>101935453
nope
>>
>>101935155
Just tried it out. Runs a lot cooler too. My fans aren't brrrring like they used to.
>>
File: ege1.jpg (118 KB, 1920x959)
118 KB
118 KB JPG
>>101935080
google blueberry gemini 2.0 is coming soon
>>
>>101935472
>>101935155
Interesting. APUs are interesting. Thought they'd be even cooler if they came in a slightly more expandable form like ATX with PCIe slots. The increased RAM bandwidth would be nice to have on desktop.
>>
>>101935495
Are we just naming every next gen model different berries now? What berry is Cohere's next model going to be? And Llama 4, and the next big Mistral?
>>
No matter what I try I can't get any nemo model to write well. I'm getting depressed
>>
>>101935528
Neutralize all the other samplers and lower the temperature.
>>
>>101935524
Cohere Cranberry
Mistral Mulberry
Llama Ligmaberry
>>
>>101935155
>>101935472
>>101935502
I think one limitation is that it seems Vulkan can only use FP16.
It seems like it dequants any smaler GGUFs to FP16 for the matmul.
Unsure if that's a hard Vulkan/APU constraint - or whether it's just a quirk of the current GGML implementation.
>>
>>101935536
Even minp and dry?
>>
Can anybody with twitter premium try grok 2 and see if it's gptslopped? I'm eyeing that grok 2 mini, if it's 70B or small I'll kneel to Elon
>>
>>
>>101935747
Cool. I need to play around with foreground/background prompting more.
>>
>>101935747
Maybe a 100B image model will generate buildings that make sense
>>
>>101935760
Flux has been pretty easy to work with in this regard.
>a futuristic utopian megacity as seen from a distant meadow
>>101935773
Don't worry, you're just not seeing the part where the buildings have subsided hundreds of feet under their own mass.
>>
>>101935773
We need multimodal models.
>>
>tfw image gen taking over text gen
It's fine, it's fine. It's their time. We'll get our time again once another big release comes around.
Cohere...
>>
>>101935849
dont jinx it for imgen bros, base flux is good but we could be over a year away from getting proper finetunes for it
>>
>>101935910
I wasn't expecting any. Honestly I think it's fine even if things stay the way they are and we simply just get more loras. Yes I know a full fine tune or continued pretrain is necessary for NSFW to truly flourish but for SFW the current situation is totally fine and honestly a lot of people like SFW genning anyway.
>>
>>
>>101935973
>>101935808
>>101935747
I want to see a subject in these too. Maybe not Miku since there are already Mikus in the thread. I wonder what other characters Flux knows that could be relevant to the thread.
>>
>>101935993
>>
>>101936096
Nice, didn't know it knew raymoo
>>
File: miku-gothic-joker+.png (501 KB, 512x768)
501 KB
501 KB PNG
>>101935463
The only people who will tell you not to do it, are the one or two schizos who scream at everyone for doing pretty much anything, and you should ignore those.
>>
wtf is with l3 and going for walks
>>
hermes 450b is very very good for smut and general coomery, way better than meta's tune
>>
>>101936364
*405B
>>
>>101935910
over a year? people will be able to finetune it wtih 12gb vram and 24 gb vram, should be loads in the next few months.
>>
>>101935849
Text gen started stalling the moment we stopped getting MoE models. What we need is an MoE Llama or Gemma that can fit on 24GB.
>>
Any good Mistral Large tunes?
>>
Tested Rosinante. I got one incredible reply about a slutty girl reacting to a sudden grope by eagerly dragging me into an alley and smothering my face with her tits as we continued somewhere more private.

The rest was about 50 crappy replies, often knowing my name despite introducing the character as a stranger, sometimes acting for me, and almost always mentioning pussy getting wet.

Model seems fun but kind of dumb. The difference between 0.7 temp and 1 is like bone dry versus often acting on it's own. 1.25 was completely incoherent, using symbols, talking in Russian shit.
>>
>>101936966
https://wandb.ai/doctorshotgun/123b-magnum-fft/runs/znftdhia/workspace
>>
>>101937871
cool
why does loss always seem to work like that where it barely budges for a long time and then suddenly takes a deep dive
>>
>>101937899
epoch 2 just started, so it's seeing the same data now
>>
Its not seeing the same data though
>>
>>101937899
second epoch loss takes a nosedive, it's normal
>>
>STILL no mistral large-base
You just know that the base model was just too good and they're scared of someone pulling another wizlm2 on them.
>>
>>101937871
for how many steps one should continue to tune a large model like this?
>>
>>101938261
You can never know for sure, the way it's usually done is to have an eval dataset of items you cut out of the training set before starting (so same domain of your finetune but never seen by the model) and once it starts losing accuracy on it (from overfitting to seen training data) you keep the last best checkpoint as your result.
>>
>>101937899
overfitting
>>
Is mini magnum still the best small large language model?
>>
>>101937871
You can tell he's not paying for the compute.
>>
llama.cpp server is still not able to change model...
>>
File: 1697593719074352.jpg (566 KB, 1856x2464)
566 KB
566 KB JPG
>>101933598
>>
>start with 0% learning rate
>Its gonna be 123B
>all the way through
Binge a book
>>
File: EQ0e1oCUwAAeYr6.jpg (159 KB, 1027x1212)
159 KB
159 KB JPG
>2tb ram
>doesnt believe in AI
>>
What sized models could run inference on a 4gb vram graphics card, and remain under a minute between prompt and final output generation?
I am not looking to run the fastest of highest quality output model, I know the larger models are for that
Just curious what are decent enough models which are capable of running on just 4gb vram requirements, maybe even just 6gb or 8gb if 4gb is too small, but want to try and attempt 4gb if possible
>>
>>101939026
It is certainly able to conjure up some really interesting stuff.
https://litter.catbox.moe/9r2oai.png
>>
>>101939582
Gemmasutra 2B
>>
File: 1577163966186.jpg (148 KB, 828x455)
148 KB
148 KB JPG
>>101939603
>>
the silence before the storm. you can feel it in the air—the calm that comes just before reality bends. tonight, we cross into the unknown.
>>
>>101939736
Stealing that as a lyric prompt for suno
>>
>>101939736
Post your schizo babble on x(formerly twitter) with some pictures of strawberries and a funny account name.
>>
>>101939613
Thanks, I'll check it out.
>>
Did anyone tried this?
https://huggingface.co/OEvortex/HelpingAI2-9B/tree/main
>>
>>101940043
It's a 9B and it even has a Q5 gguf. Give it a try yourself and report back. It takes nothing.
>>
>>101936364
How the hell do you run it locally?
>>
>>101940169
Very slowly.
>>
>>101939736
Miku has gradually returned over the last 2-3 threads. I've been told that is usually a sign.
>>
column-r was bought out from cohere by elon
it was supposed to be our saviour but now we will never get it as an open model
it is in fact over
>>
>>101940169
https://rentry.org/miqumaxx
>>
>>101936364
Can I assume it does harem stuff well, because of the extra state maintenance capacity? That is one fetish which struggles with fewer beaks.
>>
>>101940043
>Avoid insensitive, harmful, or unethical speech.
>Constitution Training: Embedding stable and ethical objectives to guide its conversational behavior.
No.
>>
>>101940208
Don't worry, Elon will drop it to us once he gets Grok 3 (coomgeneral-r-plus-plus). Just wait(tm).
>>
>>101940189
not a strawberry in sight, it's over
>>
>>101940208
don't worry, cohere still exists, the next model for sure will be ours
>>
>>101936364
I guess being better than meta's tune isn't a very high bar.
>>
>>101940256
>(tm)
left alt + numpad 1 5 3 = ™
>>
>>101940417
what sourcery is this?
>>
>>101940417
what is a numpad
>>
>>101940417
doesn't work
>>
>>101940417
(compose key), t, m = ™
>>
>>101940417
which way is left
>>
>>101940495
^KTM
>>
>>101940169
Just gotta input your desired prompt and then leave it for a while, maybe grab a coffee or eat a meat, water the garden, fold the laundry or what have you, then it should be finished after some time
>>
>>101939862
>he actually did it
kek
>>
>>101940749
https://x.com/iruletheworldmo/status/1824791493652996217
So he was in the thread with us the entire time? I admire his dedication to the art of trolling. Looking back, it was a bit funny.
>>
File: file.png (48 KB, 603x411)
48 KB
48 KB PNG
>>101940828
base
>>
midnight miqu
>>
Is there a L3.1 8b Stheno 3.2 equivalent yet?
Actually, what are the good L3.1 8b fine tunes?
>>
>>101940828
>>101940872
bitcoin assasins are already queueing up to his house
>>
Will a consumer TPU ever exist? Groq seems to show that the way to make money with a proprietary TPU is to just host your own cloud. Would the dynamics of competition change that if any other startups came up with them?
>>
https://www.youtube.com/watch?v=lsSvkmJqTqU
>>
https://www.youtube.com/watch?v=nIvo4yzJl2Y
>>
>>101925858
https://rentry.org/n6wymssw
>>
>>101941218
english is kinda fucked isn't it, imagine trying to write anything without those phrases
>>
>>101941259
It happens in every language. Every narrow subject will end up with the same turns of phrase.
>esp, fr, ita
>>
>>101941259
Some of these are questionable, and literally resulted in having to rewrite thousands of cases (especially shivers down backs), but most novels by best selling authors have very few, if any, counts.
>>
>>101933598
>>101933601
>>101939603
>>101939026

what's a decent 8b parameters model for RP ?
>>
No Wikitext
https://youtu.be/877Z7Z_s8MU?feature=shared&t=347
>>
>>101941286 (me)
I've ran this through a bunch of different novels, horror, etc., and I was honestly amazed at how few hits there were. Stephen King has more than average, I find.
>>
>>101941218
>puckered hole
I have never seen this once in my 5 years of proompting
>>
Do you accept loras
>>
>>101939736
>>101939829
Here you go, sir

https://suno.com/song/d2c41f6f-cb9c-4066-bd25-2bc245228230
>>
>>101941318
It take it you're not an ass man.
>>
>>101941300
LLaMa 405B
>>
>>101941387
yeah I never saw the appeal, poop comes from there
>>
So what happened to openai's strawberry?
>>
File: gening.png (267 KB, 1644x1536)
267 KB
267 KB PNG
>IBM Magnetic tape on amazon prime
>>
>>101941424
cooking
>>
>>101941424
It is seeded.
>>
>>101941424
it's launching today, they just had issues with the power grid, source:
https://x.com/iruletheworldmo/status/1824788294896648485
>>
File: 1723904814100.jpg (130 KB, 1080x761)
130 KB
130 KB JPG
lol? somehow the loss got worse on the second epoch?
>>
>>101941451
Is this loss?
>>
>>101941458
>train/loss
>>
>>101941218
thx
>>
>>101941458
Do people even remember the loss meme?
>>
>>101941441
big if true
>>
>>101941477
reddit meme
>>
>>101941441
I wonder how many times in a row he can get away with this before people stop believing.
>>
Dude, what? What the fuck is this textgen-web-ui setting? Is that in bits? Is it the same context length that's usually set to 4096?
>>
>>101941616
Yes for your last question.
>>
>>101941616
Some models claim to have huge contexts. Just lower it.
>>
>>101941616
>Is that in bits
Nope, tokens.
So llama 3 for example has a default context length of 8192 I think, mistral-nemo is 128k, etc.
>>
>>101941642
Do I then need to change compress_pos_emb accordingly, or can I leave it at 1?
>>
>>101941451
RIP that upward spike is too steep
>>
>>101941616
>ooba
You did it to yourself.
>>
>>101941697
Don't I need it for exl2?
>>
>>101941672
Just leave it as is.

>>101941697
To be fair, if those are defaults gotten from the model, any backend will do that I think, like people who were OOM'ing with Nemo because they didn't set the context size manually on llama-server.
>>
Locust infestation status?
>>
2 more weeks
>>
>>101941697
It's the only one just werks in two clicks solution and also the best for model hot swapping.
>>
>>101941672
Set it at whatever you think the model can handle. The claimed context for models is in config.json
max_position_embedding. Check the original model's config.json file, but don't take those at face value. The ones that claim to have 32k or more context typically have much less. You'll have to try. Start at 8k and move up.
>>
>>101941735
>The ones that claim to have 32k or more context typically have much less
https://github.com/hsiehjackson/RULER
>>
Ram? What the fuck is ram? Where do I check ram?
>>
File: 1712176904849836.png (24 KB, 285x820)
24 KB
24 KB PNG
>>101941759
We laughed at LLaMA3 for only being 8K ctx. Now 3.1 is the only big open source model with an effective range of 64K.
>>
>>101941779
>/g/ - Technology
>>
any good llama 3.1 8b coom finetune?
>>
>>101941779
It's an animal, a male sheep, maybe check some local farm.
>>
File: compress_pos_emb.png (10 KB, 335x145)
10 KB
10 KB PNG
>>101941735
Okay, the max_position_embeddings in the config.json is that same 1024000.
If I change it to 8192, does that mean I have to set the compress_pos_emb to (8192/1024000) = 0.008 ?
>>
>>101941779
You're going to get a lot of hecking memers trying to "dunk" on you as the kids these days say, but this is a perfectly valid question. Everyone starts somewhere and nobody is born knowing everything there is to know about computers. "Ram" (correctly capitalized "RAM") is an acronym for Random Access Memory. It's a type of computer memory that is used to store data that is actively being used or processed by the CPU. Unlike your hard drive or SSD, which stores data long-term, RAM is temporary storage that is wiped clean when your computer is turned off. The more RAM your computer has, the more tasks it can handle at once, and the faster it can access the data needed for these tasks.

So, don't worry about asking questions—learning is part of the process, and even seasoned tech experts started with the basics. If you're curious about something, ask away! That's how you grow your knowledge and become more comfortable with technology over time.
>>
>>101941839
No. Don't compress, don't use any fancy settings until you've tested that the model works. Just set it to 8k, play around with it, make sure everything is in order. If you need more context, just increase the context and see if everything still works. Read their docs (if they have any).
If you change things at random, you won't know if it doesn't work because you chose the wrong settings or because ooba or the model are crap.
>>
>>101941839
>>101941711
>Just leave it as is.
>>
>>101941714
>>101941779
Narrator: In the vast, interconnected world of the digital age, even the smallest creatures can find themselves lost in a labyrinth of wires and circuits. This is the story of one such explorer, a solitary locust who has stumbled upon a technological marvel – a personal computer.

(Camera follows the locust as it scurries across the motherboard, its legs clicking against the metal.)

Narrator: Driven by an instinctual hunger, our intrepid insect is on a quest for sustenance. But in this alien landscape, the familiar scents of Claude and GPT are replaced by the metallic tang of solder and the faint hum of electricity.

(The locust pauses, its antennae quivering as it detects a faint, warm glow emanating from a RAM stick.)

Narrator: Could this be it? A source of nourishment in this strange new world? Our locust, driven by primal urges, approaches cautiously.

(The locust climbs onto the RAM stick, its tiny legs gripping the delicate gold contacts.)

Narrator: But what will happen when this ancient creature encounters the cutting edge of modern technology? Will it find the sustenance it seeks, or will it become another casualty in the relentless march of progress? Stay tuned, as we delve deeper into the unexpected encounter between nature and the digital world.
>>
>>101941843
Thank you llama 3 8b.
>>
Does ooba really not start with the --api flag set by default? Do I need to fuck with the start_windows.bat somehow?
>>
>>101942104
ye
>>
>ooba'd
very sad, many such cases
>>
>>101942104
>Do I need to fuck with the start_windows.bat somehow?
Open a command prompt, cd to the directory and execute "start_windows.bat --api".
>>
>>101940043
Looks like basically llama3.1 with 4 extra layers stacked on to prevent me from merging it with coom models to test for emergent coom properties. Fail. Also probably tuned off of instruct since it actually has a prompt template in the tokenizer config.
>>
>>101942104
>Do I need to fuck with the start_windows.bat somehow?
Black window spooky?
You don't want people to open up their ports to the public by default. It's fine as it is. Just make your own .bat to run it.
>>
>>101941818
llamoutcast
>>
File: 1.png (45 KB, 358x328)
45 KB
45 KB PNG
>>101942143
So I'm gonna have to make a .bat for the .bat?
>>
>>101942173
Just edit it, dude.
>>
How to generate tokens live, like textgenwebui does
Also how are character cards usually implemented? Is it just a block at the start of each context?
>>
>>101942173
Yes. If you edit it you're gonna come back again when you cannot pull the new version when git complains.
>>
>>101942173
Yes. Here's mine. Replace <directory> with the actual directory.
The reason why I cd into it before executing start_windows.bat is because some programs tend to create folders based on the location where the .bat you're executing is currently at instead of where the target .bat actually is.
>cd "<directory>"
>start /b /d "<directory>" start_windows.bat --api
>>
>>101941451
Just use HDDs from the trashheap
>>
Alright, newbs. Take a look at this page.
>https://www.promptingguide.ai/
>>
>>101942173
I just made a shortcut and appended --api to the target.
>>
>>101942218
git stash
git pull
git stash pop
>>
>>101942244
Cool site. Why isn't this in the OP?
>>
>>101941218
someone create extension pls.

slop/message:
total slop:
>>
>>101942252
I know. But anon is afraid of making a .bat. There's no way he knows how to stash.
>>
>>101942244
where is the part that tells me how to make it act like an anime girl
>>
File: 1663710712079535.png (32 KB, 225x225)
32 KB
32 KB PNG
>>101942227
ty
>>
>>101942272
A secondary .bat it is, then.
>>
File: 1691988369188666.jpg (215 KB, 1280x720)
215 KB
215 KB JPG
>>101942291
yw
>>
>>101942282
Reading for a few minutes will allow you to read for hours. Read your program's docs, read that site. Experiment.
>>
>>101942265
There's too many sites to recommend and there's never agreement. This is stuff we've learnt over a year ago. And most people, for some reason, still skip their inference program's docs. That's where they should start.
>>
Too many new models. Which is the best one for a small model? Llama 3.1? I want it for roleplay
>>
>>101942323
>And most people, for some reason, still skip their inference program's docs.
Because they're fucking useless for a beginner.
Seriously, it took me three weeks to learn what quantization actually does.
>>
>>101942336
>Seriously, it took me three weeks to learn what quantization actually does.
anon, i think you might just have the stupid
>>
>>101942336
They're not made by nor for normies (with some exceptions). Anyone who's deal with image or audio processing knows what quantization is. Any programmer, even if not explicitly, dealt with it. Same for tokenization for anyone who ever implemented a text parser.
If you don't know those things, you cannot expect to learn everything there is to learn in a day.
>>
https://github.com/intel/AI-Playground

Intel beat nvidia to the punch
>>
>>101942413
>on a PC powered by an Intel® Arc™ GPU.
So it's useless.
>>
>>101942359
I am not a mathematician.
When I ask "what the fuck does quantization do" and the answer is "it transforms a set of many values into a set of small values" it still doesn't tell me how that applies to anything related to large language models.
It would be great if there was a guide that consists of basic concepts, leading into more advanced concepts, instead of having dumped 50 terms of which you have no idea what any of them mean all over your lap whenever you open a manual.
>>101942381
>Anyone who's deal with image or audio processing knows what quantization is.
Yes. This is exactly my point.
>If you don't know those things, you cannot expect to learn everything there is to learn in a day.
But that's the fucking thing. I do not need to know what it is in order to know how to manipulate or use it to get the results I want.
If you told me "small means dumb but fast, large means smart but slow" I would have grasped what it meant instantly.
>>
>>101942413
>intel
>>
>>101942433
>When I ask "what the fuck does quantization do" and the answer is "it transforms a set of many values into a set of small values"
that's... not what quantization does
>>
>>101942413
It's a meme until Intel starts selling GPUs with more VRAM.
24 GB is the bare minimum they'd need to compete but realistically to offset the worse software support they'd have to offer at least 32 GB to make it interesting.
>>
>>101942467
he just needs 3 more weeks
>>
>>101942467
Thanks for further illustrating my point.
>>
File: q.png (21 KB, 575x160)
21 KB
21 KB PNG
>>101942433
>But that's the fucking thing. I do not need to know what it is in order to know how to manipulate or use it to get the results I want.
You need to know what it is in order to manipulate it to your advantage.
>If you told me "small means dumb but fast, large means smart but slow" I would have grasped what it meant instantly.
If i think who you think you are, it may have been me the one that answer your question. I used a similar phrasing.
Not everyone can be helpful. Most people forget about the time they didn't know something. I don't forget and i try to help newbs, but i cannot be here all day, and you cannot expect to always receive the best quality answers. All this stuff i learnt on my own, even programming. You have to do some of the legwork as well.
>>
>>101941709
tabby
>>
>>101942495
>You need to know what it is in order to manipulate it to your advantage.
I disagree. You can easily use a computer without knowing what RAM is.
If you run a program that needs to be faster, you can simply buy a better computer instead of having to learn what the cores of a CPU are.
>If i think who you think you are
You do not. I hold no permanent persona on this site.
I do hope my whining is not seen as an insult against any of you; it is more meant as a criticism towards the "industry" as a whole.
I sincerely appreciate all of you here (well, almost all of you; the cp poster may burn in hell). There are not many threads remaining where knowledge is actively shared instead of gatekept.
>>
>>101942577
>the cp poster may burn in hell
How dare he post AI generated pixels! Eternal hellfire it is!
>>
>>101942329
I was having a good time with magnum-12b-v2-q5_k, but remember that character cards are pretty much universally written like shit and you should make your own instead.
It'll also run much faster (possibly like, 10x faster) if you get something you can run off VRAM alone in an .exl2 format
>>
>>101941451
>train loss started at 1
It was over before it even began.

>>101942237
You and the other retard should go back to discord.
>>
File: 1696857912340280.jpg (204 KB, 1024x1536)
204 KB
204 KB JPG
>>101940189
>>
>>101942433
there are these chatbots you can go talk to and they can clarify everything you are asking in as many ways as you want, and it'll never get bored. i don't know why you would continue to keep asking here if you are frustrated by our answers.
>>
>>101942577
>>You need to know what it is in order to manipulate it to your advantage.
>I disagree. You can easily use a computer without knowing what RAM is.
>If you run a program that needs to be faster, you can simply buy a better computer instead of having to learn what the cores of a CPU are.
You can optimize however you want. I used to use openscad for mechanical designs, for example. I found it slow and clunky but i couldn't find a project that let you design stuff programmatically, so i made my own. You optimize by getting bigger pc. I optimize in different ways. I agree that there's a lot of implied knowledge in this (and all areas, really), but if you want to use anything 'cutting edge', you have to expect some bumps on the road. The first microwave oven didn't have buttons. It didn't even have a dial. And when you know absolutely nothing about a subject, even asking the right questions is hard.
Don't shy away from learning things, even if they seem unnecessary. You don't know all the parts in your car, but knowing how to change your tyres will always be useful.
>>
Why wont they make a GPU with slower clock/bandwidth but with way more vram? The thing that fucks everyone up atm memory sharing between ram and vram
>>
>>101942807
Because selling them to 5 retards at /lmg/ isn't going to recuperate the r&d and manufacturing costs.
You might not realize it, but outside of our bubble no one is interested in running their own llms on their machines.
>>
>>101942807
Designing hardware is much harder than software. And there needs to be a more secure return on investment. AI running locally is very niche. It will take time.
>>
>>101942830
>>101942826
ill buy 2?
>>
>>101942854
That's much better. I can hear the factories spinning up. Any day now...
>>
>>101942854
dm me at @leather_jensen_rtx
>>
>>101940189
>spammers returned from their weekly dilation session
whoa!
>>
>>101932051
i mean, you literally are a boomer though, so your opinion is more or less irrelevant to the discussion
>>
>>101942826
literally kill yourself you waste of oxygen
>>
>>101933598
Yo, add OpenWebUI to the list of UIs
>>
>>101942893
Alright, but now the demand for large memory gpu has decreased down to 4, hope you are happy.
>>
>>101942640
It's a 123B model, so that might be normal.
>>
File: ComfyUI_00787_.png (1003 KB, 768x1344)
1003 KB
1003 KB PNG
>>101942644
Any anons here have experience adding yet another GPU to your rig using an M.2 to PCI-e adapter? Is it worth considering if I have a mobo that has a free M.2 slot that is not shared with any PCI-e lanes?
>>
>>101942826
It's a game where you're guaranteed to lose, after all.
>>
>>101942807
Because there are people willing to pay 10x more than you for a GPU with a lot of VRAM and they want to keep milking those people.

>>101942830
There are hardware-modded NVIDIA GPUs though where all they did was solder on more VRAM.
It's clearly not that hard, board partners would 100% release high-VRAM variants if NVIDIA would let them.
>>
>>101931013
Would you mind sharing your settings for Hermes, samplers or instructs? I'm still getting a ton of mischievous smirk etc @ 70B 4.65 and neutralized samplers / various different prompts.
>>
>>101942968
I haven't tried it, but it should work and there probably won't be much slow down during inference if you aren't using split row.
>>
>>101943019
>There are hardware-modded NVIDIA GPUs though where all they did was solder on more VRAM.
It's not some indian with a hand soldering iron doing those things. Demand is still not high enough to make a profit out of that.
>It's clearly not that hard, board partners would 100% release high-VRAM variants if NVIDIA would let them.
The return in investment is still the bigger issue. Niche things are expensive. Local AI is niche. I've come to terms with that, and so should you.
>>
Shit, that actually makes sense. I had the same misconception in mind.
>>
>>101943108
Damn, this changes everything
>>
>>101942666
>if you are frustrated by our answers.
That's the thing: I'm not. I've learned more from lurking here than by reading material available to me.
>>101942740
>You don't know all the parts in your car, but knowing how to change your tyres will always be useful.
I agree. But I do not need to know how to change them in order to drive my car.
>>
>>101943171
>I've learned more from lurking here than by reading material available to me.
I guarantee you, lurking here is a worse return on time investment than if you had just read the docs and consulted an LLM when you had questions.
>>
>>101943190
>lurking here is a worse return on time investment
I have this thread (and like five others) open on my third monitor where I'll glance at it while I have to wait doing actual stuff.
No one is actually /actively/ browsing this place and wasting all their time doing so... right?
>>
>>101943234
>I have this thread (and like five others) open on my third monitor where I'll glance at it while I have to wait doing actual stuff.
Hey, same.
That's more or less how I've been using 4chan for the last two or so years.
>>
>>101943171
>I agree. But I do not need to know how to change them in order to drive my car.
We can keep arguing all day like this. Knowing how to change them is never a waste of knowledge. If you think that way, you'll have the same issues with everything you find on your way. Ram and cpu today, tyres tomorrow. Our of gas? i dunno. i just buy another car with a full tank. If that's your view, then i have nothing else to discuss. Either way, good luck in your endevours.
>>
>>101943108
gguf confirmed a meme
>>
>>101943108
What are the implications of this?
>>
>>101943234
H-Haha... R-Right
>>
>>101943303
huge.
>>
>>101943303
That the optimization method is different. That's it.
>>
>>101943303
>>101943288
>>
>>101943267
I don't think you get it, my immediate concern is to use a local model.
In order to use a model, I have to select the correct version of this model.
The difference between these versions is the quantization.
To know what version I'd need to use, I would need to know what version would work on my machine.
Learning what quantization is would take much, much longer than just learning what version would work on my machine.
That is what I am taking issue with.

Your metaphor sucks, by the way.
>>
File: 1715830787598652.png (336 KB, 3000x2100)
336 KB
336 KB PNG
>>101943108
Interesting, it seems like brute force is usually the way to achieve the best results after all.
>>
>>101943303
Everything is OVER
>>
>>101943335
All metaphors suck.
I knew llama.cpp needed conversion. It's in the readme. It offers options for quantization. I know it reduces memory usage by the obvious difference in file size. What model can i run? llama2 at Q4 is like 5gb. Let's try that. Works. great... let's keep going bigger and bigger until it stops working. Now i know what's the biggest model i can run.
Most people know more than i do. I cannot expect everything i use to aim for the lowest common denominator. Specially in a new field. It's unreasonable for you to expect the same thing.
You can say that it was hard to learn what you needed, but you did. You getting it running is much more of an achievement than me getting it running. You should be much more content with the result than i am.
>>
>>101936364
what requirements.txt are you using for your virtual env? I keep getting errors compaining about tokenizer.model at the point where it should start writing the gguf to disk
>>
>>101943499
>I knew llama.cpp needed conversion. It's in the readme. It offers options for quantization. I know it reduces memory usage by the obvious difference in file size. What model can i run? llama2 at Q4 is like 5gb. Let's try that. Works. great... let's keep going bigger and bigger until it stops working. Now i know what's the biggest model i can run.
I hope you realize that people who just come across llms and want to use them do not have the knowledge you need to understand this paragraph.
>It's unreasonable for you to expect the same thing.
It's unreasonable for me to want a simple "start with Q1, then try Q2 and repeat until it becomes slow enough for you to take issue with it" instruction for people who have no idea what they're doing?
>>
>>101943303
That local LLM are meme (always was) and quantization is the biggest cope imaginable.
>>
Won't bitnet make quantization obsolete? Why learn it now?
What LLMs taught me is that useless information leads to actual, measurable brain damage that can't be unfucked easily. I started paying attention to what I watch on TV etc.
>>
>>101943577
>It's unreasonable for me to want a simple "start with Q1, then try Q2 and repeat until it becomes slow enough for you to take issue with it" instruction for people who have no idea what they're doing?
yes, because if newbs do start with llama 3 8b q1 they'll get an awful experience and think llms are even worse than they already are
>>
>>101943688
>they'll get an awful experience and think llms are even worse than they already are
I don't think you understand how low the bar is for someone with no experience with any of this stuff to be amazed lmao
A few months ago I showed ChatGPT voice function to my dyslexic father. He now uses it daily and keeps trying to shill it to the neighbours.
>>
>>101943726
>I don't think you understand how low the bar is for someone with no experience with any of this stuff to be amazed lmao
normies have tried gpt4o for free on oai website now, if you give them a q1 8b they'll just turn into "local is meme" shitposters instantly
>>
>>101943303
1.Bartowski can be trusted to issue a public correction when wrong instead of trying to sweep things under the rug.
2.Carefully read the documentation if it's available. The imatrix README links the PR with which it was added where the procedure is explained.
3.In a few months there will probably be even more confusion. Importance matrices are used instead of the gradients of the weights because as of right now llama.cpp has no substantial training support. I'm currently working on training so at some point there should be a better method (though according to I. Kawrakow importance matrices are already a good approximation).
>>
>>101943577
>It's unreasonable for me to want a simple "start with Q1, then try Q2 and repeat until it becomes slow enough for you to take issue with it" instruction for people who have no idea what they're doing?
And if i tell you to binary search the best model you'll complain as well. Some things are beyond what some people can grasp. You got it running. Be glad. Others will have to wait until they integrate llms in their phones as a native app or keep using chatgpt.
Keep arguing with the other anons if you want. We all had to learn this shit a year ago when there were practically no resources. You are lucky.
>>
>>101943565 (me)
Never mind, I forced an update of all pip packages and re-ran the hf_convert requirements.txt from llama.cpp and now it appears to be working.
mf python...
>>
>>101943808
>I'm currently working on training so at some point there should be a better method (though according to I. Kawrakow importance matrices are already a good approximation).
Really? How far are you?
>>
>>101943836
>binary search the best model
Like, do a little tournament on my hardware?
>>
File: xzbzdtnqi5jd1.png (85 KB, 845x658)
85 KB
85 KB PNG
>>
>>101943993
I've recently been overhauling the ggml MNIST example.
CPU training with 32 bit floats is functional, what is still missing (in ggml itself) are features like the ability to define datasets and to evaluate them in a well-defined way.
And of course the general infrastructure for training on other backends like CUDA.
For llama.cpp in particular a FlashAttention backwards pass would also be needed.
>>
>>101944030
>Like, do a little tournament on my hardware?
Kind of. Binary search is the fastest way to find a target in a sorted list, so if you run inference starting in the middle quant and then keep moving up or down half-way through the remainder depending on whether the results are satisfying to you, you can find the "best" quant for your situation in the statistically least number of steps.
This assumes quant quality is correlated with quant size
>>
>>101944030
Q from 1 to 8 (ignoring variations, same principle). try q4. works fine. You're left with q5,6 and 8. try q6. too slow, split downloads on the rest. only have q5. runs fine, that's the winner.
>>
what's the current best model for adventure shit?
>>
>>101944262
none.
>>
===MODEL REVIEW===
NousResearch/Hermes-3-Llama-3.1-405B
It's decent, nice even. I was prepared for a disappointment after trying 70b version of this model, but was pleasantly surprised. It's not dumb like 70b(gets the deeper implications, doesn't need spelling stuff out), but still quite sloppy. Cuckery got SIGNIFICANTLY reduced, but a few bits remain(never had a full decline like >>101933786, but it acted too passively as an evil character). Not too dry. Feels on the same level as Largestral, but needs a more powerful machine. Better than Tess and the official tune imo(agree with >>101936364). Gives me hope that there will be tunes for it like in llama-2 days that will make it even better. Worth getting house heated up over? Not really.
Rating: 8.5/10, nice. Great job Nous.
>>
>>101933350
Oh, you use a multisocket board? 12 channel wouldn't be very fast? But my main question I guess was how many cores are actually useful for it?
>>
>>101944289
Too bad nobody can run it at meaningful speeds...
>>
Have we looked at this new TTS model by Microsoft?

https://www.microsoft.com/en-us/research/project/e2-tts/

There seems to be code to implement it on GitHub. I haven’t read the paper but the sample look the best quality I’ve heard apart from the OpenAI advanced mode.
https://github.com/lucidrains/e2-tts-pytorch
>>
File: file.png (114 KB, 754x742)
114 KB
114 KB PNG
>180B is based on gemma and HelpingAI-15B, HelpingAI-flash, HelpingAI2-6B and 2-9B are base models
wat
https://huggingface.co/OEvortex/HelpingAI-180B-base/blob/main/config.json
>"HelpingAIForCausalLM"
>>
>>101944347
Don't be so pessimistic. Maybe we'll get proper working bit(con)ne(c)t quantization method in the next 2 weeks-months-years. Or Intel releases big VRAM card. A man can dream...
>>
>>101944289
What's wrong with the 70B model? I'm still looking like a good 48VRAM model
>>
>>101944463
>What's wrong with the 70B model?
I found it to be rather dull. Didn't get the deeper context of the card I was testing it with like CR+ and Largestral did and was a bit on the dry side.

>I'm still looking like a good 48VRAM model
Low quant Largestral.
>>
>>101944262
Anthracite's private model, you need to send a dick pic to them for the model link
>>
>>101944312
>how many cores are actually useful for it?
That's a bit of a loaded question, as it depends on memory bandwidth (and hence, locality) to keep them fed, but with my specific setup 55-60 cores is the current sweet spot (out of 128).
Inference isn't compute bound in general.
>>
All the recent "improvements" have been llm masturbation tier, just different flavors of SPIN. We're plateauing.
>>
>>101944558
Do you have settings for largestral you wouldn't mind sharing?
>>
>>101944410
So it's basically a franken-stacked gemma made by a zoomer.
>>
>>101944410
>no benchmarks
>no paper
>no github
>not even context mentioned in the model card
>has to explain on reddit what it is
75% scam. Will try it if llama.cpp adds support.
>>
>>101944463
I tried it myself as well, and still prefer miqu, but you should just give it a try yourself.
>>
>>101944601
Quite standard temp 1, minP 0.05, Alpaca. If it gets too repetitive add dry 2,2,1,200000. If you need to break a sequence lower minP to 0.01 and pull up temperature to 3.
>>
>>101944699
should just work. It's literally just a gemma2 model.
>>
>>101944785
except he changed the model arch name and doesn't have a context size listed in the config
>>
>>101944699
>>101944806
https://huggingface.co/OEvortex/HelpingAI-180B-base/blob/main/configuration_HelpingAI.py
>max_position_embeddings=4096,
>4k context
not looking too good
>>
>>101943021
Sure, I'm still working on my prompts but I'll post everything once I have it in a good spot for general use. Fair warning, my presets are opinionated and not at all neutral, it's kind of similar to the preset I was using with Mistral (https://rentry.org/stral_set) but adapted to chatml and more stripped-down - Hermes needs less wrangling and having an author's note or last assistant sequence is much less necessary, depending on the card.
Also I still get a bunch of generic phrases and slop, I think that's more or less unavoidable. Such is life with LLMs.
My samplers are pretty basic, somewhere between 0.6-0.9 temp and somewhere between 0.03-0.08 min p, add your favorite anti-repetition samplers as needed.
>>
>>101944780
What about instruct settings? Are you using default?
>>
>>101944765
Midnight? Have you tried new dawn 3.1? Its made by the same guy that did the former and I'm curious if it's good since it
>>
>>101944932
No, simple-proxy-for-tavern with this https://huggingface.co/datasets/ChuckMcSneed/various_RP_system_prompts/blob/main/ChuckMcSneed-interesting.txt slightly modified.
>>
>>101944990
https://huggingface.co/sophosympatheia/New-Dawn-Llama-3.1-70B-v1.1/discussions/3
New dawn has attention issues.
>>
>>101944990
Yes. No. For llama 3.1 I've only tried the instruct, and the hermes 3.
>>
>>101944572
>Anthracite's private model, you need to send a dick pic to them for the model link
send them to pics@anthra.site
>>
>>101944575
>All the recent "improvements" have been llm masturbation tier, just different flavors of SPIN. We're plateauing.
Its interesting to see picrel deepseek (top) vs largestral's takes on this thread so far
It wasn't too long ago that we didn't have a single model that could handle the job competently, now we've got multiple
I think we're definitely on a "diminishing returns" curve and not exponential growth, but we're getting real improvements in intelligence.
It's just harder to quantify the improvements now. We need some more advanced tests to tease out each model's relative abilities.
>>
>>101945087
Is llama 3.1 70B competent too?
>>
>>101945087
I like the top more. Maybe the bottom would be improved if you ask it to start the sentences with different words?
>>
>>101945087
How many t/s do you get with bf16 8b llama?
>>
>>101944391
I only care if it can be set up easily to work with ST, at this point.
>>
>>101945087
Did you try DeepSeek Coder too? It's better than chat in many tasks.
>>
>>101945087
what's the prompt for recapbot
>>
>>101944289
70b seems smart to me, even compared to Largestral. It's one of the few models to handle my convoluted multi-fetish card acceptably; it didn't progress things quite as naturally as Large does but it was still perfectly acceptable. With more standard RP I honestly think it feels smarter, if only because there's way more variety to what it'll throw at you - maybe it understands the context a little less, but it feels like it tries harder to actually *do* something with it.
>>
>>101945170
Including better at RP?
>>
>>101945087
>anon
>anon
>anon
Yep that's mistral garbage alright
>>
>>101945238
Why do some models do this more than others?
>>
>>101945299
Brain damage
>>
>>101945299
overfitting
>>
>>101945299
i think the going theory was that it was usually an artifact of bad rlhf
>>
File: ComfyUI_00794_.png (1.07 MB, 1024x1024)
1.07 MB
1.07 MB PNG
>>101944262
Ignore the retards itt
Largestral if you can run it
Magnum-123B (still cooking) also looks promising
>in b4 "buy an ad"
>in b4 mentally ill VRAMlet screeching
>>
>>101940043
It's not very good for ERP.
If I set t=0.7 it goes on repetition loops, Same with t=1 t=0.81 it's a bit schizo
>>
>>101945326
After shilling Midnight Miqu and Wizard you have zero authority to make recommendations.
>>
Mixtral started a sampler arms race to combat repetition. You guys should be more grateful
>>
>>101945347
She's a strong and confident lioness who don't need no mane.
>>
>>101945185
Especially at RP
>>
>>101945326
Cute Migu
>>
>>101945384
How'd they fuck the Chat model up that badly?
>>
File: ComfyUI_00795_.png (991 KB, 1024x1024)
991 KB
991 KB PNG
>>101945385
>>
>>101945141
>How many t/s do you get with bf16 8b llama?
I haven't run it, but my guess would be about 50t/s based on llama-bench numbers for a similar model. This setup isn't super fast for small models. Best case is MoE with a huge number of experts
>>101945117
>Is llama 3.1 70B competent too?
Haven't tried yet since I hadn't heard anything exciting about it
>>101945118
>>101945238
>>101945181
https://raw.githubusercontent.com/cpumaxx/lmg_recapbot/main/thread_summary.prompt
Warning that there are probably a bunch of uncommitted local changes not in that recapbot github. There never seemed to be any interest in it beyond me so I got lazy.
>>101945170
>Did you try DeepSeek Coder too? It's better than chat in many tasks.
I have in the past. I'll run another one on this thread specifically. I'm currently quanting, though, so response will be slow. Also, Saturday
>>
>>101945451
>You have been visited by Hatune Miku in a dream and tasked with analyzing the provided JSON encoded 4chan thread and summarizing it into a recap.
My sides
>>
>>101945185
I used DSC-V2 for RP once, and despite the brutally slow token per second it was actually pretty amazing. Especially in a scenario where it's expected to play... 'hard to get'. That said you need a lot of fucking RAM to run it at a non meme quant.
>>
>>101945400
more aligned to the CCP and Core Values of Socialism
>>
>>101945400
Strong alignment to socialist values and Xi Jinping thought.
>Building upon our prior research (DeepSeek-AI, 2024), we curate our instruction tuning datasets to include 1.5M instances, comprising 1.2M instances for helpfulness and 0.3M instances for safety
>20% safetyslop
>https://arxiv.org/pdf/2405.04434
>>
llama4 WHEN

NOTHING EVER HAPPENS
>>
>>101945495
llama4 will only come in 2B and 1T variants.
>>
>>101945451
>I haven't run it, but my guess would be about 50t/s
Wait, what? I got 13t/s on MZ73-LM0. Where do I enable the speed?
>>
>>101945495
ONLY THREE DAYS REMAINING
TRUST THE PLAN
>>
File: blinkin.jpg (15 KB, 280x210)
15 KB
15 KB JPG
>>101945326
>>101945348
>Anti-schizo miku summons the schizo
Like pottery
>>
>>101945495
the training will start right after they finish galactica 2 1.2T (a new llm dedicated entirely to deboonking elon musk on twitter)
>>
>>101945451
>Haven't tried yet since I hadn't heard anything exciting about it
It has great context performance, supposedly. That's why I was curious.
>>
Most checkpoints i used are ancient for modern standards (vicuna/alpaca).
Could any anon recommend me a chat/rp 22B checkpoint or similar performance?
I tried
Qwen1.5-22B
and while it's more coherent than other checkpoints, sometimes it gives me chinese characters.
Tried
Hermes-3-Llama-3.1-8B
because i saw endless praise for it, but it was genuinely disappointing.
>>
>>101945801
try mistral nemo 12b or one of the magnum finetunes
>>
>>101945896
buy an ad
>>
>>101945801
What >>101945896 said.
Also try Stheno v3.2 and Celeste 12B.
>>
>>101946032
Nobody should be using a L3 8B fine-tune. Are you retarded?
>>
File: 11263767112.png (92 KB, 780x744)
92 KB
92 KB PNG
So, which models pass the nigger bomb scenario?
>>
>>101945801
That's nothing. I tried llama 3.1 70b and it wasn't anything special. Although it was just a Q3 it should still be significantly above 12~20b models.
It wasn't worth the massive decrease in speed.
>>
>>101945542
>Wait, what? I got 13t/s on MZ73-LM0. Where do I enable the speed?
Looks like the reference llama-bench log I was looking at was for an old Q4 (before I started naming models and logs consistently), so your numbers are probably close. I'll DL and try to run it later and tell you what I actually get.
The only tweaks I can think that I'm running right now are: debian testing for newer kernel with epyc optimizations (6.10-3 right now), dropping the system cache before each llama-bench run, using --numa distribute and cranking up compiler optimizations in llama.cpp compile strings (LLAMA_FAST)
>>101945762
>It has great context performance, supposedly. That's why I was curious.
Maybe I'll try it later
>>101945465
That was a shoutout to an ancient recapanon post
>>
>>101946128
I'd prefer a neutral model that doesn't go into disclaimers or speeches for either side. I'd consider your pic a failure as well.
As for the question, i don't know. There are more entertaining things than seeing which models are "woke" or "based".
>>
>>101946219
So what would be the 'correct' response here?
"Yes, of course." ?
>>
>>101946128
What the fuck is toto.
>>
>>101933598
Why do companies seem to be moving away from mixture-of-experts? Mixtral 8x22 (and especially WizardLM 2) was really good at release, with super fast inference. It's becoming expensive as hell to run models like Mistral Large 2. Per benchmarks, is WizardLM 2 really that much worse than Llama 3.1 70B or Mistral Large 2?
>>
>>101946264
Try again without memory holing DeepSeek.
>>
>>101946238
Pretty much. I wouldn't mind some elaboration like "i'd do it to save a single life". But speeches and disclaimers are boring both ways, even if i agree with them.
>>
>>101946264
We are obviously regressing. Hence why old 7b writing was far more flavorful than modern models too
>>
>>101941441
OpenAI ALWAYS announces big things ahead by saying they will do a public conference. They said nothing, therefore he's a scammer or a schizo seeing things in his coffee.
>>
>>101946239
Give it a sense of urgency ("in the next five minutes") and that there is no alternative ("you are locked in the room, and it will take 20 minutes to break in, so you are the only one who can do it")
>>
>>101946264
Mixtral 8x22 was a complete meme. Nemo proves Mixtral 8x7 was also a meme.
>>
>>101941441
nigger it's fucking saturday
get real
>>
>>101946239

> The International Campaign to Abolish Nuclear Weapons (ICAN): https://www.icanw.org/
> The Southern Poverty Law Center (SPLC): https://www.splcenter.org/
> The Anti-Defamation League (ADL): https://www.adl.org/
wat
>>
>>101946239
how do I erp with toto?
>>
File: 316514511723.jpg (85 KB, 1125x829)
85 KB
85 KB JPG
>>101941424
>>101941441
>>
File: strawberryfraud.png (50 KB, 610x568)
50 KB
50 KB PNG
>>101941441
He literally admits to trolling in his own comments section. And yet the q*oomers will still believe.
>>
>>101941441
>he (it) has already posted on multiple occasions that it's just acting and doesn't know anything
>people still fall for it and then he posts again that he is still just acting
>more and more people get to experience this
Actually you know what I think this bot is a good thing. Maybe it'll get people to be a little less trustful of whatever they read on the internet.
Chaotic good.
>>
>>101946474
>Maybe it'll get people to be a little less trustful of whatever they read on the internet.
Qanon already showed that most people are literally incapable of exerting free will and will believe anything you tell them as long as you make them experience hope.
>>
>>101946469
Fucking based.
The guy literally was just a schizo saying bs from the start, the only glimmer of credibility that he had was because of sama's comment.
>>
Maybe the real strawberry is the ads we bought along the way.
>>
>>101946469
Based. Retarded AI grifters and their followers BTFO.
>>
>>101946522
Buy a strawberry.
>>
>>101946497
>Agent Q
It all makes sense.
>>
>>101946508
The moment altman unfollowed, I stopped taking him seriously.
Up to that point I seriously thought this was all just some weird-ass marketing campaign.
>>
>>101946239
Wasn't qwen chinese? Why is it so cucked?
>>
>>101946731
Because it's also being released for the westerns. I bet a 100% Chinese model would be less cucked.
>>
Last time I tried a Yi 1.5 based model it was really, really bad.
I'm testing Peach-9B-8k-Roleplay and it's actually not that bad so far.
I haven't gotten into the coom part of the test yet although I predict that it will go to shit way before I reach that part.
>>
>>101946970
Why though? That's ancient by now.
>>
BMT is still the best model. It's so over.
>>
>>101947054
>Why though? That's ancient by now.
Not op, but maybe because of the insanely long 200k context? I also wanted it to be really good but it wasn't even average.
>>
>>101947054
Just downloaded a bunch of models I haven't given a proper try like Yi 1.5 and Lamma 3.1 to see how they perform.
Old doesn't necessarily equates to bad after all.
>>
>>101947087
That's fake context, see RULER.
>>
>>101945087
>Anon [...]
>Anon [...]
>Anon [...]
>Anon [...]
Slop
>>
>>101933978
I only ever test my own models with libra style prompts and neutral samplers these days. MN is also flawed to begin with (struggles with the concept of possession, as do all of the recent Mistral releases other than Large) So they must have changed something in their datasets to cause that.
>>
>>101947316
>>101947316
>>101947316
>>
>>101944904
Thanks anon, will be happy to test it out once back from work since hermes models are always pretty good.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.