[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 468519161.jpg (1.52 MB, 2048x2048)
1.52 MB
1.52 MB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>100364633 & >>100357937

►News
>(05/06) IBM releases Granite Code Models: https://github.com/ibm-granite/granite-code-models
>(05/02) Nvidia releases Llama3-ChatQA-1.5, excels at QA & RAG: https://chatqa-project.github.io/
>(05/01) KAN: Kolmogorov-Arnold Networks: https://arxiv.org/abs/2404.19756
>(05/01) Orthogonalized Llama-3-8b: https://hf.co/hjhj3168/Llama-3-8b-Orthogonalized-exl2
>(04/27) Refusal in LLMs is mediated by a single direction: https://alignmentforum.org/posts/jGuXSZgv6qfdhMCuJ

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling/index.xhtml

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: 1690249618405081.jpg (57 KB, 800x816)
57 KB
57 KB JPG
►Recent Highlights from the Previous Thread: >>100364633

--Paper: vAttention: Efficient Dynamic Memory Management for LLMs: >>100371691
--Paper: QServe: Efficient LLM Serving with W4A8KV4 Quantization: >>100371017 >>100371592
--Paper: Vidu: A Highly Consistent Text-to-Video Generator with Diffusion Models: >>100370918
--Correction: How Server Handles Tokenization with add_special and special_add_bos Flags: >>100364675 >>100364736
--LlaMA.cpp Commit: Introduce BFloat16 and Jart16 Support: >>100372805
--Adding Filler Tokens to Context: A Path to Model Intelligence?: >>100366492 >>100366519 >>100366751
--Distributed Mixture of Experts for Enhanced AI Collaboration: >>100369066 >>100370675
--Tokenization Quirk in Mikupad with Llama 3 8B and Ooba: >>100365898 >>100365957 >>100366251 >>100366798
--Normal Instruct vs Orthogonalized: Refusal Rates and Token Tricks for AI Models: >>100365490
--TTS Anons Share Your Secrets for Creating RP Voices: >>100364778 >>100364834 >>100365474 >>100365405
--Granite Code Models: Underwhelming Performance and Limited Context Sizes: >>100365985 >>100366019 >>100366054
--Red Hat Announces RHEL AI for Open-Source Generative Models: >>100364813
--L3-70B VRAM Requirements for Training vs L2-70B: >>100367712 >>100368392
--Sao10K/L3-Run1: LLaMA 3 Trained on Heavily Filtered Claude 2 Logs: >>100369801 >>100370077 >>100370140 >>100371444 >>100371481 >>100371744
--Leaked and Officially Released NovelAI Text Models: >>100371280 >>100371295 >>100371376
--New Flash Attention Implementation Slowing Down Token Generation in LLaMA CPP?: >>100366023 >>100366037
--Qwen 110B's Performance on EQ-Bench: >>100369117 >>100369338 >>100370084
--Llama3 GGUF Conversion Issue: Losing Training Data: >>100372971
--Miku (free space): >>100364973 >>100365523 >>100365529 >>100365659 >>100365996 >>100368243 >>100370052 >>100370639 >>100372618 >>100365496

►Recent Highlight Posts from the Previous Thread: >>100364645
>>
>>100373066
https://files.catbox.moe/2z42dk.webm
>>
File: miquuu.png (3.65 MB, 1664x2432)
3.65 MB
3.65 MB PNG
ftw you chose stinky llama over miqu
>>
>>100373111
How are the herpes around your penis miku?
>>
>>100373114
stop projecting your problems on others, trannyfag
>>
>>100373111
there will be mentally ill anons that will keep using miqu a year from now because it is miqu.
>>
Can I make my own MoE on a 4090 or will I need to rent a server on Vast? How difficult/long would it take to make something like Mixtral if I'm just stacking existing models?
>>
>>100373153
If you are just stacking existing models then your MoE will not do much. I said it a few threads back that the best bet for independent coomers would be making a coomer expert and plugging it into one of existing moe's. But even that needs at least a few a6000's.
>>
>>100373189
I'm new to this shit but I wanted to see what I could do. I figure slapping Llama3 8b 4x8 would probably be better than running it at fp16 on the 4090. Since I don't have to actually train I wondered if it was even feasible or how much I'd have to pay just to try it.
If it was hundreds of bucks I wasn't even going to bother to learn how.
>>
>>100373230
>I figure slapping Llama3 8b 4x8 would probably be better than running it at fp16 on the 4090.
Fuck off to reddit. Stop spreading this meme here. I am fucking tired of all the frankenmerges spreading like cancer it is.
>>
>>100373239
kys nobody is making you download the model if I'm able to make it. I doubt I even publish it. Nobody else is going to make something good for 24gb so I'll make something for myself.
>>
>>100373230
Yeah, you have to go back.
>>
>>100373280
>I doubt I even publish it. Nobody else is going to make something good for 24gb so I'll make something for myself.
Great. Fuck off.
>>
>>100373239
>>100373316
>>100373350
(You) can stop seething vramlet lol
>>
I have a 4070, 12gb of vram
it's enough to play around with small models

If I bought a P100, that would give me 28gb of vram to use for EXL2 (granted I've never tried that, only ever tried gguf), would that allow me to play around with 70b like the much talked about midnight miqu at good quants?

I know that making a separate all p100 or all p40 server just for inference would be preferable, but I'm working with what I have here.
>>
File: 1715168408231.gif (3.61 MB, 600x540)
3.61 MB
3.61 MB GIF
>>100373062
>>100373066
What did the pirate say to the LLM?
Shiver me timbers!
>>
File: 1709726846782154.jpg (76 KB, 1024x722)
76 KB
76 KB JPG
Alright anons, I need your help.
Someone please tell my what the fuck I'm doing wrong.
I posted in a previous thread that I had trouble getting exl2 quanted Llama 3 70b to work with higher context and pretty much only got told to use TabbyAPI, which I already did.
I did further testing and it turns out, it's wildly different to gguf even at lower context.
I asked it in the prompt to write short 1-3 paragraph responses and the Q4 gguf happily does that, never breaks formatting and keeps the style consistent.
The 4bpw exl2 on the other hand sometimes gives me long 5+ paragraph responses and becomes more and more schizo the longer the context gets.
So wtf is the problem?
I thought it could be the 4-bit cache, but turning that off yields the same result.
Do I need to use different presets/samplers with exl2? Doesn't really make sense to me.
I'm using the default ST Llama-3-Instruct context and instruct template and I tried all kinds of different sampler settings.
No matter what I do, the exl2 quanted model shits the bed.
I'd really like to keep using exl2, it's so much faster than gguf.
>>
Has anyone attempted to train Hatsune Miku's voice for use with Piper?

https://ssamjh.nz/create-custom-piper-tts-voice/
>>
I tried out llama3 and it performed much worse at RP than midnight miqu, I know this probably isn't a surprise to anyone but there you go
>>
>>100373437
rope yourself
>>
>>100366023
Yeah, it's causing random pauses in generation for me, slowing me down from 8 t/s to 5-6 t/s.
I think it's probably best right now to set your context limit to whatever you would without it, leave it off, and if you happen to have a long chat which needs the extra context then turn it on when you hit your limit so you can extend your context.
>>
>>100373443
SSL cert on your blog is fucked, mate. Fix your shit before shilling here.
>>
>>100373474
Tabby supposedly ropes automatically.
Either way I tried 4x alpha with ooba too, which didn't help much.
>>
>>100373114
>>100373131
>>100373143
>>100373239
>>100373280
>>100373316
>>100373350
>>100373474
>>100373483
Have you thought about being a more welcoming community?
>>
>100373513
This is exactly why we need to bring back /g/uro
>>
llama-3-70b q6 or command-r-plus iq4_xs?
>>
>>100373513
>being a more welcoming community?
Newsflash: religious mikuposting is off-putting to normal people. Stop that and you will get less people being toxic in response to it.
>>
File: animewebsite.gif (284 KB, 360x640)
284 KB
284 KB GIF
>>100373560
Not a Mikufag, but formal 4chan users are used to anime-style girls being posted and are unperturbed by it.
>>
>>100373560
>>
>>100373583
>formal
Normal. Derp.
>>
>thread about AI on an anime imageboard with a model literally named MIQU
>reee miku
>>
>>100373483
Works now
>>
>>100373590
>pointing at the camera
I wish Stable Diffusion did this more consistently.
>>
File: GNCGrG3XUAEHURZ.jpg (338 KB, 1187x1512)
338 KB
338 KB JPG
>>100373604
I'll be sure to name the next model septic tank porn.
>>
>>100373443
no, but I got curies voice from fallout 4 working

https://github.com/Mobile-Artificial-Intelligence/piper.cpp
>>
>>100373671
https://github.com/dnhkng/GlaDOS/tree/main/models
https://huggingface.co/poisson-fish/piper-vasco
>>
>>100373066
>-LlaMA.cpp Commit: Introduce BFloat16 and Jart16 Support:
the holiest of keks
>>
>>100373755
So llamacpp now has a literal troon bit format and you want me to believe that mikuposting ITT is done only by normal straight males?
>>
>>100373611
Maybe sd3 will. I gave up waiting for it and uninstalled comfy for now.
>>
>>100373818
>mikuposting ITT is done only by normal straight males?
mikuposters have always been troons. straight males are makisefags
>>
>>100373874
I'm a straight male and I don't know who either of these children's cartoon characters are?
>>
>>100373436
Such a funny Miku you are!
>>
>>100373437
You might want to post this in turboderps GitHub or on discord. I think I’ve noticed something similar but I don’t use ggufs often enough to do a b testing.
>>
>these children's cartoon characters
I'm glad this sort of person will always feel slightly uncomfortable on this website.
>>
>>100373874
Miku has been around longer than the average troon lifespan.
>>
>>100373874
but I like both
>>
>>100374009
noooo, I want to fit iiiin.

leyley is the new mascot now. anime fags fuck off
>>
>>100374159
go back, wegger.
>>
So wait, what's the deal with llama.cpp and llama 3? Is the stupid Mac dicksucker sabotaging GGUF inference? I know he was basically saying he'd do as such/discontinue support as soon as possible.
>>
>>100374210
working as intended
filtering redditors is a feature not a bug
>>
Sam Altman loves penis
>>
>>100374230
I see. Wait, what's the issue with it, then? They make it sound like it's unfixable, but it's just a skill issue?
>>
troon-bitnet when?
>>
>>100374247
>They make it sound like it's unfixable, but it's just a skill issue?
pretty good one sentence summary there
they were catastrophizing about not being able to get the right output on their finetune because they were failing to supply the prompt correctly (literal backslash + n characters instead of \n, extra bos token)
once you fix those two skill issues everything works fine
>>
>>100373550
Bonus Theme:
https://www.youtube.com/watch?v=FuwlA_YxJuE
Crypt of the MikuDancer Edition
>>
File: URGAH.png (1.66 MB, 894x894)
1.66 MB
1.66 MB PNG
>A symphony of
When the FUCK did this shit get slopped into every model? It's so fucking annoying.
>>
https://old.reddit.com/r/LocalLLaMA/comments/1cn1398/part_4_theres_likely_no_llamacpp_gguf_tokenizer/
>Are you Johannes on GitHub by any chance? Dude seems really bitter about this.
>I’ve been watching this thread. That Johannes guy is a real dick
lol
>>
File: 1682489705753137Fix.png (2.75 MB, 1080x1266)
2.75 MB
2.75 MB PNG
WHERE THE FUCK ARE THE Q3 AND Q2 GGUFS OF LUMIMAID 70B??
IKARI, UNDI, PLEASE FOR FUCKS SAKE!
CAN'T EVEN RUN THAT STUPID NEW SCRIPT YOU'RE SUPPOSED TO USE.
>>
>>100374284
A lot of people liked Any and GeyGey, I just assumed because it had decent writing and it was free and it talked about incest and cannibalism. Maybe I'll pick it and up and play it, maybe I'll just watch a let's play of it, iunno.
>>
>>100374384
A symphony of slop, testament to shitposting
>>
>>100374401
>https://old.reddit.com/r/LocalLLaMA/comments/1cn1398/part_4_theres_likely_no_llamacpp_gguf_tokenizer/
>>
>>100374426
>https://old.reddit.com/r/LocalLLaMA/comments/1cn1398/part_4_theres_likely_no_llamacpp_gguf_tokenizer/
>>
File: 1708484367334214.jpg (289 KB, 1024x1024)
289 KB
289 KB JPG
>>100374426
>>https://old.reddit.com/r/LocalLLaMA/comments/1cn1398/part_4_theres_likely_no_llamacpp_gguf_tokenizer/
>>
What would happen if we took a repetition sampler like DRY and let it see/access all text ever generated throughout all sessions? Would it be beneficial?
>>
File: file.png (106 KB, 2038x365)
106 KB
106 KB PNG
>>100374401
Both accounts mentioning the Cuda dev were made after we found Petra’s Reddit account:
https://desuarchive.org/g/thread/99405126/#99406931
And the first comment of that one was made literally to shill Midnight Miqu.
I think it’s obvious who’s orchestrating this FUD campaign: the Kobold Discord.
>>
>>100374542
you're biased since you're from the sharty and you guys had a meltdown over it because you're ironically all radfems. the funniest part is you guys thought it was written by a troon instead of the usual female fanfic writer lol
>>
>>100374560
Why kobold dev? Isn't Kobold just taking llama.cpp and making it more user-friendly? Not sure why they'd want to shoot themselves in the foot by disparaging their source, if llama.cpp is seen as bad then kobold is, too.
>>
>>100374576
Kobold Discord != the kobold devs
The former is where the merging movement started. The one trying to destroy open source from the inside. The one trying to destroy this thread.
>>
>>100370675
>select the best rather than breaking down
Sort of but not entirely. Sure, the main motivation is that (an unknown) one of them will be the best, and we want that present to hopefully influence the final answer. But there's still the further step of having a model consider all of the proposals. Maybe this is not as elaborate as original tree of thought? You could also introduce more elaborate step by step breakdowns within this process. I should probably go carefully read the original tot paper.
>>
>>100374056
Based NAI purveyor.
>>
>>100374611
>the only thing this game accomplished was to make me self reflect into the places i need to actually clean up in my house.
I know you won't believe me, but I'm one of those rare people that finds cleaning relaxing. Only thing disorganized in my room rn is my desk.
>>
>>100374601
>He doesn't have enough VRAM to run Goliath.
>>
File: file.png (62 KB, 1022x359)
62 KB
62 KB PNG
It is funny.
>>
>>100374611
>i literally have no idea what position the sharty holds on this game.
>posts the sharty girl Petra for the third time this thread already
oh yeah I forget you sharty zoomers find the idea of blatant obviously false lying really funny. must be a brown thing
>>
>>100374401
>llama.cpp adds a second BOS token under certain conditions/frontends if it already exists (still under debate whether that's to be considered a bug or user error)
There is no reason that should ever be possible. Especially when user has to go out of his way to verify what is actually going into the loader.
>>
>>100374486
The girl would just tell you that she already sucked your dick and annihilated your prostate so you should probably go on a journey together and forge some bonds or something.
>>
>>100374667
I think that was phrased poorly. llama.cpp adds a BOS token as expected and then separate frontends that use llama.cpp don't show that clearly so users then add a second BOS token. user error but how would someone using ollama for example know. Think cuda dev did mention that a lot of that redditors problems were really for downstream applications but he decided to build a rep for whatever reason in finding bugs in llama.cpp
>>
what the fuck is a sharty?
>>
File: file.png (162 KB, 1475x924)
162 KB
162 KB PNG
>>100374667
https://github.com/theroyallab/tabbyAPI/commit/fb1d2f3
It’s pretty obvious that it’s user error. Why would they check the 'add bos' option while adding it to the template too? Fuck users.
>>
>>100374704
Shawty is a 10. A 10. A 10.
>>
File: file.png (328 KB, 475x296)
328 KB
328 KB PNG
>>100374564
>guys had a meltdown over it because you're ironically all radfems
Thing can be both shit and trigger radfems. We live in a false dichotomy where stellar blade is good because it is the opposite of woke. Both things are trash and if woke shit didn't exist stellar blade would also not exist and instead we would have things that are actually good. I wish things would just burn to the ground but I expect we will now get an era of woke and antiwoke shit neither of which is going to be good.
>>
File: miquu.png (3.09 MB, 1536x2304)
3.09 MB
3.09 MB PNG
REDDIT LLAMACPP BUG REACTION! /lmg/ REACTS
>>
>>100373560
>Newsflash: religious mikuposting is off-putting to normal people
Yeah what's the deal with that? Huh, get a load of this guy, postin' Migus.

What next, huh? Puru Puru Purin posting?
>>
>>100374711
Anon I double checked my template doesn't have it and I clicked add bos option. And I still don't know if it is ok or not cause I don't see what tokens go into the loader. Yes you can blame frontend for that but in the end there is no valid use case for double bos token so filtering it on the backend is the correct choice. Especially when as I am saying now - I still have no idea what is happening.
>>
>>100374724
Thanks Unsloth for their great efforts to help get to the bottom of this!!!
These guys are truly amazing! (I'm not affiliated with unsloth, but I use it for fine tuning and it's amazing)
Check it out if you haven't already anons!
https://github.com/unslothai/unsloth
>>
>>100374735
Think of le heckin' newfags!
>>
>>100374704
2oyjak.p4rty
>>
If you can run it, go to hugging face right now and download the thing. I downloaded the 4bpw exl2 version and I think I never talked with a Chatbot this intelligent. It might not pass the logic tests, but it feels more human than anything I ever tried before.
>>
>>100374404
lumimaid sucks
and not in a good way
>>
>>100374806
>Chatbot this intelligent.
>It might not pass the logic tests
Literally the next sentence.
>>
>>100374401
>llama.cpp doesn't have a code of conduct
>toxicity festers in github issues
>potential contributors from reddit don't feel included
I think we found a job for jart: code of conduct enforcement committee chair.
>>
>>100374761
hello r/localllamaXlmg/, what is best - chatgpt, claude or novelai? Also if everyone has a free code please send
beardedestrogen@hotmail.com
>>
>>100374817
Isn't it better than the base instruct and it's flaws?
>>
The thought that once AGI takes over, it will make internment camps for all the frankenmergers, because of the abominations they created, lets me sleep at night.
>>
>>100374820
>code of conduct enforcement committee chair.
You started that thought with tranny janny and then tried to obfuscate that didn't you?
>>
>>
>>100374761
Is it that data collection company nobody should use?
>>
>>100374806
If you think that's good wait until you see 8x120b it's the real deal if you can run it, it's not on huggingface rn tho. It feels more human than myself I forgot I was talking to a chatbot and felt like I was the chatbot.
>>
>christina thread streak ends
>mikuposting resumes
>/lmg/ turns into toxic cesspit
>>
>>100374920
Yeah, because you can't take Miku being Queen of /lmg/
>>
File: 1235829.jpg (861 KB, 1500x2686)
861 KB
861 KB JPG
>>100374920
I'm posting Miku to distract from the shitposts though. Yeah I wish this website let you see samefags too.
>>
File: miku.jpg (328 KB, 1920x1080)
328 KB
328 KB JPG
MIKU MIKU BEEEEEEEEAAAAAMMMM
>>
>>100375032
>I am shitposting to distract from shitposts
Mikuposter IQ level move.
>>
What's the difference between
>--batch-size N logical maximum batch size (default: 2048)
and
>--ubatch-size N physical maximum batch size (default: 512)
in llama.cpp server?
Also, interesting to note that llama.cpp server defaults to 2048 batch size, which I was using with koboldcpp and I was told that I shouldn't do that, that I should leave batch size on default.
I guess that means that my assumption that there's no reason to not use the largest batch size you can without sacrificing other things was correct then.
>>
>>100375051
But anon everyone enjoys Miku, therefore it's not shitposting.
To be serious though it's much easier to mentally filter image posts.
>>
>>100375069
>everyone enjoys Miku,
I don't. It is the shittiest type of post here. Even petra posting is better.
>>
>>100375061
Without having explicitly checked the code I THINK that ubatch size is only relevant for pipeline parallelism when e.g. using multiple GPUs.
>>
>>100375061
>I was told that I shouldn't do that
What was their reasoning? Have you tried it? Do you believe random people on the internet?
>>
File: 1642124950589.png (362 KB, 1672x1440)
362 KB
362 KB PNG
>>100375091
>Even petra posting is better
>>
File: 1715009001077.png (216 KB, 2288x1461)
216 KB
216 KB PNG
>>100375091
https://strawpoll.com/kogjk0Lw1Z6/results
>>
File: 2493368.jpg (914 KB, 1800x1600)
914 KB
914 KB JPG
>>100375091
What do you mean? You don't like this cute girl?
>>
whoa it's so funny he's spamming petra a bunch again for the hundredth time what a funny life haha
>>
>>100375095
I see, thanks. I'll just leave it at the default settings then.
Interesting way of putting it, "physical maximum".
Makes me think that it's like the difference between actual memory the OS sees and the virtual address space you mess with when doing low level programming.

>>100375103
They gave no reasoning hence why I didn't listen to them.
Just thought I'd share in case the person in question is reading, and since I've seen that sentiment repeated a couple of times before.
Maybe they'll understand that batch size is a setting to be changed like any other as long as you do some A B testing for performance regression and such.
>>
>>100375136
Yeah. Seeing her reminds me I am in a thread with trannies.
>>
whoa it's so funny he's spamming miku a bunch again for the hundredth time what a funny life haha
>>
>>100375149
only one of us here has a folder full of pictures of troons, little zoomiebro. but jeez isn't it time for your sissy hypno session?
>>
File: 1978293.jpg (584 KB, 1500x1243)
584 KB
584 KB JPG
>>100375149
Well that's unfortunate but a (you) problem. Please try to get it fixed. Normal people don't immediately think of trannies when they see Miku.
>>
Midnight Llama when?
>>
File: IMG_8058.jpg (854 KB, 1388x2048)
854 KB
854 KB JPG
it's just miku
there's nothing to be afraid of
>>
>>100375091
Is that so? I don't think so, pal.
>>
>>100375177
When I'm done making High Noon Alpaca
>>
>>100375129
Petrabros...
>>
>>100375177
https://huggingface.co/decapoda-research/llama-3-70b-instruct-titan-0.1
70B merge between cat-llama3 and storywriter
>>
Need miqu-llama3-32k. MistralAI would be a hero if they dropped it
>>
So I downloaded both code granite 8b base and instruct, and they are both schizo as fuck even on deterministic.
What am I doing wrong? I'm just using ooba for straight fp16 transformer load.
>>
>>100375209
no quoots yet looks like
>>
Can you mikufags stop fighting with Petr* and vice versa? Newsflash: This is already got old and boring.
>>
If Wizard, Miqu, CR+, and L3 are so good in their respective ways, why don't we just make a merge from them? We might no be able to merge the weights, but we can shove them all into a clowncar and train a router model on top to choose which output tokens to trust.
>>
File: leeku.jpg (88 KB, 768x1024)
88 KB
88 KB JPG
>>100375283
Newsflash: you're a massive faggot
>>
>>100375283
I already did. He didn't reply to my last post so I have no need to post any replies myself either.
>>
Mikulove
>>
>>100375284
Sounds good, I'll make the logo
>>
>>100375189
oinku is the worst
>>
>>100375283
There was never a fight, for a while P*tra posting was a bannable offense in this thread.
>>
>>100375168
ok tranny
>>
Whats the current sweet spot for llama3 with a single 24GB 3090?
Q2 70B models don't quite fit, but q8 7B leaves a lot of memory unused.
>>
File: 1714667742163842.jpg (51 KB, 720x720)
51 KB
51 KB JPG
>>100375284
>Let's just smash llama 2, llama 3 and something entirely different together!
Wow, this is the dumbest post I've read in a while. At least you didn't suggest tossing an
>>
>>100375377
It still is, actually. Petr* gets banned fast when he posts his tulpa, that's why he doesn't do it quite often anymore.
>>
>>100375377
I vote that miku, kurisu, teto and petra posting should all be bannable ITT for being offtopic. It is all offtopic and leads to flamewars and spam.
>>
>>100375401
>tulpa
How deep does this lore go?
>>
File: file.png (209 KB, 2414x1191)
209 KB
209 KB PNG
A reminder that there’s someone actually selling a Miqu 120B frankenmerge for $20 a month.
>>
>>100375389
None. Or this one https://huggingface.co/ISTA-DASLab/Meta-Llama-3-70B-Instruct-AQLM-2Bit-1x16. But I can't get it to work and someone on reddit said context eats up so much vram that this doesn't work.
>>
>>100375409
Miku and Teto are considered apart of the general's culture and they are considered ontopic. Kurisu doesn't happen enough to be considered culture or bannable.
>>
>>100375385
Sorry but your world model is wrong in this case.
>>
>>100375400
Are you ok anon?
>>
>>100375444
>apart of the general's culture
I agree. Ban it.
>>
>>100375389
You can use bf16 transformers 8B and cope by pretending that there's a difference from 8bit, at least until you get sick of how unoptimized transformers is. Otherwise nothing. 2mw until someone supports some kind of decent 2bit quantization in a non-shit backend.
>>
>>100373066
>>100371280
>>100371295
>>100371371
>>100371376
For the NovelAI leak you can take a look here, but I'm not sure if it actually contains the text generation files
https://iwiftp.yerf.org/Pony/Software/Generative%20AI/NovelAI/NovelAI%20leak%202022-Oct/
>>
File: 87932.jpg (93 KB, 400x560)
93 KB
93 KB JPG
>>100375464
>I agree. Ban it.
No
>>
File: ezgif-7-dafe148a4a.webm (1.13 MB, 206x254)
1.13 MB
1.13 MB WEBM
>>100375409
This. Let's post neuro-sama instead.
>>
>>100374819
LLama 120B wrote this, please understand.assistant
>>
>>100375489
>>100375436
Reeee, why can't they do a something like a 18B Q8 model, fit snug with plenty of room for context.
I've seen a 20B and 13B model on hugging that from what guess is some frankenstein chopped down from 70B thing, but don't know how well they actually run.
Just annoying since 70B does give some great results for me, but its just slow as shit with having a large chunk of it running on CPU.
>>
>>100375566
>no argument
Well sure I would like to say it out loud that Mikuposters again show their true colors and prefer their offtopic posting to relevant discussion. Therefore petraposting is fine, shitting on them is fine and /lmg/ being a toxic cesspit is also fine. Great. Fuck you mikuposters.
>>
>>100375444
Teto doesn't happen as much as Miku, there's only one anon that posts teto, it's EXACTLY the same case as the kurisu anon, but the kurisu anon is cancer.
>>
>>100375609
Miku and teto is for troons. Eat a dick you troon.
>>
File: hatsune-miku-miku.gif (543 KB, 220x228)
543 KB
543 KB GIF
>>100375602
>Ignores the thread cultural argument
>Gets shut down
>Cries about it
kek
>>
>>100375177
in 5 min
>>
>>100375602
>Therefore petraposting is fine
In moderation, yes. Multiple people post Miku, a select few (or even 1 person) post P*tra.
>shitting on them is fine
It was always fine to be buttblasted by Miku, its the same small group everytime, its why its ignored.
>and /lmg/ being a toxic cesspit is also fine.
Its a toxic cesspit because you are here, when you go back to uni it becomes normal again, no one cares about your opinion, no one will care about your opinion, and you will effect no change.
>Great. Fuck you mikuposters.
I'd return the favor but I know that when Uni starts up again you'll be gone, you're always temporary tourists here.
>>
>>100375675
>>
>>100375641
you are not an arbiter of culture you troon. you can only subvert it you parasite. you have no argument.
>>
>>100375689
>Multiple people post Miku
it is called a discord server
>>
>>100375503
Neat. The prodmodels folder in part 2 does look like a gpt model.
Anyone wanna try making it run?
>>
>>100375739
Mikuposting is a natural occurring cultural phenomenon, P*tra is astroturffed by discord and you can tell its astroturffed by discord because of the content they post about, take >>100375724 as an example.
>>
>>100375724
>Made up false flags to try to reinforce your point
You aren't as smart as you think you are, lmfao.
>>
>>100374649
>sharty girl
https://archive.4plebs.org/x/thread/33302075/#33302194
/x/ girl*
>>
>>100375757
>no u
Pathetic troon.
>>
>>100375757
This, on one cared about the Mikuposting as long as the posts were on topic. The whole "Miku is a Tranny" thing only happened after petraposting got banned so they had to pivot to try to co-opt what was popular on the board.
>>
>>100375790
Yup, no one really talked about trannies until the original P*traposter appeared and created a discord server.
>>
>>100375777
yeah the sharty is downstream from 4chan as we know from them getting destroyed on /qa/
>>
>>100375801
well given how Jart set him off to spamming about troons you're kind of right haha
>>
>>100375790
>on one cared about the Mikuposting as long as the posts were on topic
Exactly now stop mikuposting cause none of you troons are on topic now. Shut the fuck up.
>>
File: 1715103413753976.png (772 KB, 1024x768)
772 KB
772 KB PNG
>>100375850
>Exactly now stop mikuposting cause none of you troons are on topic now. Shut the fuck up.
Seethe, kek
>>
File: screencap.png (1.45 MB, 2048x2048)
1.45 MB
1.45 MB PNG
>>100375801
>petra appeared in june 2023
>picrel from april 2023
doesnt add up
>>
>>100375801
also P*tra originally wanted to make an irc channel (that no one used) then he switched to trying to get everyone to use a matrix channel. when that failed was when he started his campaign of spamming the thread for hours at a time. but then uni started so he lost steam
>>
>>100375854
Those aren't the same poster or it was the same poster but they hadn't evolved int P*traposter yet.
>>100375861
Yup, there was also a push to make a discord server for somer eason.
>>
>>100375853
Then you absolutely deserve the "raid" you think is in your head you tranny.
>>
>>100375895
>t. petraposter
>>
File: miku-hi.gif (1004 KB, 498x498)
1004 KB
1004 KB GIF
>>100375895
Your so terminally online that an anime girl sends you into an emotional tailspin? Are you sure you aren't projecting your desire to transition onto others?
>>
>>100375880
>Those aren't the same poster or it was the same poster but they hadn't evolved int P*traposter yet.
moving the goalpost?
>>
>>100375948
I've said it once, I'll say it again, lets just ban any mention of trannies or transgenderism, both pro and anti. Its the only way to turf these retards out.
>>
>>100375963
I've said it once, I'll say it again, lets just ban any unrelated anime girl posting, both pro and anti. Its the only way to turf these retards out.
>>
File: 1710953463956369.png (72 KB, 512x174)
72 KB
72 KB PNG
>>100375954
You should look up words before you use them, makes you look less stupid, kek.
>>
File: 1712744359167249.png (834 KB, 766x755)
834 KB
834 KB PNG
>>100375974
This is boring me now and off topic so take a Miku pic and try not to mald so much.
>>
>>100375980
ad hom? what relevance does "Those aren't the same poster or it was the same poster but they hadn't evolved int P*traposter yet." have? anons claim was "no one really talked about trannies until petra"
you lose
>>
>>100375963
look at this tranny sneakily trying to make it a safe space for himself
>>
File: gumi-vocaloid.jpg (129 KB, 757x450)
129 KB
129 KB JPG
>>100375974
We should only be posting 1 anime girl :^)
>>
File: lmg-mascot.png (128 KB, 225x350)
128 KB
128 KB PNG
>>100376014
Meant to post this instead
>>
>>100376014
omg it goomy
>>100376044
noooo
>>
>>100376014
She's always watching and so am I.
>>
>>100374760
There is a \n before bos token in the official template https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/

llamacpp must not add bos if there is bos in the template already. That's the only way to make sure that prompt begins with \n<|begin_of_text|> as the official llama 3 template requires
>>
dead thread
>>
agreed.
>>
>>100376136
>>100376145
samefag
>>
aicg is much better than lmg, at least they don't lose much time discussing off topic garbo, most likely because they are actually having fun with their chatbots.
>>
>>100376136
>>100376145
>>100376170
samefag
>>
File: .png (11 KB, 386x122)
11 KB
11 KB PNG
>>100376188
how..
>>
File: MikuImpression2.png (2 MB, 1072x1376)
2 MB
2 MB PNG
>>100376136
>>100376145
>>100375991
>>100375974
/lmg/ has two modes: happening and mikuposting
You can calculate the amount of happening by the post-to-miku-ratio
If you want to curb mikuposting then just leak GPT4's weights or something
>>
>>100376201
>happening
did you mean petraposting mode
>>
>>100376134
>There is a \n before bos token in the official template
no there isn't, that wouldn't make any sense
first token is always <|begin_of_text|> with no newline
https://github.com/meta-llama/llama3/blob/main/llama/tokenizer.py
>>
>>100376201
i had no idea /lmg/ is trash. I thought it was /aicg/
>>
>>100376182
localfags have nothing better to do while their slowass localshit is generating a paragraph for them to coom to.
>>
>>100376201
>just leak GPT4's weights
That would be antisemitic.
>>
all these tourists really showing their true colors right now
>>
>>100376182
Nothing else to discuss, though. We've been fucked in the ass with that shitty llama 3 release with laughable 8k context and single modality. Models that were trained on text only can't understand that girl can't look me in the eyes when she's being throat-fucked upside-down.
>>
>>100376201
GPT4 weights would be out of reach for EVERYONE in this thread.
>>
>>100375790
I hope you realize that the miku posting is kept to a minimal when you don't throw a bitch fit.
Genuinely just ignore it. Right now you're 50% of the reason why this thread is awful.
>>
>>100376249
could have been worst, didn't gemma release with 4k?
>>
>>100376249
>8k context
Surely, you're not sticking with vanilla models... are you anon?
>>
kaiokendev's hard work sadly ignored....
>>
Just a thought. If refusal is "mediated by a single direction", could it be that repetition is also mediated by a single direction, and thus is able to be orthogonalized? If repetition is truly just a result of learning from training, then it might be possible that there's a single "direction" that controls for it. I don't exactly know what a direction is or how orthogonadation works though.
>>
Is the Reddit raid over?
>>
>>100376320
It's ESL hours, check back in 10
>>
>>100376263
mikuposters contribute nothing and only create drama
>>
>>100376214
Well, fuck meta than
>>
>>100376263
>I hope you realize that the miku posting is kept to a minimal when you don't throw a bitch fit.
Mikuposters are pretty chill, its the Petrafags that are awful, but you would know that because you are one of them, and you are even replying to yourself.
>>
>>100376342
This post contributed nothing
>>
>>100376252
lol
>>
>>100376263
>No its the creative people who post memes and music that are the bad ones, not the people that post a tranny and talk about trannies all day.
Delusional.
>>
>>100373062
After testing im-a-good-gpt2-chatbot on chatbot arena, I can confirm it's over for local. This is significantly better than GPT-4, Opus, 70B, etc... And it's only 4.5
>>
I’ve been watching this thread. That Johannes guy is a real dick
>>
>>100376391
They have no moat. We caught up to Turbo and we'll catch up to that too. We'll leave base gpt4 behind us within this year.
>>
>>100376391
>it's over for local
Shit, they're going to confiscate our already working models? Better bury some weights in the backyard just in case...
>>
>>100376381
Oldfag here, I'll give you a nice hack. Don't interact with off-topic complaints about things zoomers don't like. Better yet, don't even post in the thread at all and lurk until they get bored and go to tiktok or something, this is all one big attempt to get attention.
>>
>>100376391
enjoy it while you can. they're going to lobotimize it for safetycucks once it gets put out for public consumption
>>
Is default ST system/instruct prompts for llama3 good, or do you use something else, anons?
Also curious about sliders, but it is probably depends on a particular model.
>>
>>100376418
I wouldn't know, but you should update and see if they added a ll3 prompt. If not, then I would assume that ll2 is your best bet.
>>
>>100376413
>they're going to lobotimize it for safetycucks once it gets put out for public consumption

The chatbot maybe. The API never changes.
>>
>>100376411
what good is /lmg/ really?
it's not like we're learning shit here or collaborating to make better models
so this is literally just a place to shitpost and look at mikus
I guess it's also a good place for discord to shill their kofis, you'e subbed right?
>>
>openai got 100 billion because chatgpt convincingly pretended to be an FTP terminal
never forget
>>
>>100376473
>what good is /lmg/ really?
I use the OP quite a bit, even if large parts of it are out of date. But yeah, its best to use it by dropping in questions or reading through research papers.
>>
>>100376473
i come here to learn about new models for RP without going through 200 threads creaming over some new corpocuckmodel solving plate on banana at 0k context.
>>
>>100373062
There are two papers on training LLMs from scratch using LoRA
https://medium.com/@bnjmn_marie/lora-the-explorer-pre-training-llms-from-scratch-with-lora-392e52bba9e6
https://kaitchup.substack.com/p/relora-pre-train-a-large-language?source=post_page-----392e52bba9e6--------------------------------

Yet no one has considered implementing this on transformers? What's going on? No one has taken this seriously or are we being gatekept from potentially game changing tech by those who can implement it?
>>
File: 00106-3050314564.png (321 KB, 512x512)
321 KB
321 KB PNG
well I was going to finetune code granite but they decided to use the gpt2 tokenizer with it and somewhere in the python libraries for loading the gpt2 tokenizer there's a syntax error that I'm too hung over to troubleshoot.
>>
>>100376516
what's the best model for 24gb right now anon?
>>
>>100376473
fastest local news when something new comes out.
>>
>>100376518
Interestingly, this one already works with Llama-
https://github.com/Guitaricet/relora
>>
>>100376518
remember when loras were a thing? I member.
used to use kimiko lora on mythomax.
nobody does that now.
>>
>>100366023
>>100366098
>>100366084
Alright, yeah, even with the latest CUDA toolkit (12.4) I get better performance without flash attention.
Maybe it's because I'm not offloading many layers with FA on, but without it and with no layers offloaded to VRAM, processing a 30460 tokens long context and generating around 270 tokens it's definitely faster without FA.
>>
>>100376546
A quick and dirty test :
>Device 0: NVIDIA GeForce RTX 3070 Ti Laptop GPU, compute capability 8.6, VMM: yes
>GritLM-8x7B-KTO.i1-Q4_K_M.gguf

>--n-gpu-layers 0 no FA:
>Prompt processing:
>{"tid":"2204","timestamp":1715182477,"level":"VERB","function":"update_slots","line":1916,"msg":"tokenizing prompt","id_slot":0,"id_task":0}
>{"tid":"2204","timestamp":1715182643,"level":"VERB","function":"update_slots","line":2146,"msg":"prompt done","id_slot":0,"n_past":30445,"n_ctx":32768,"n_tokens":1773}
>166 secs
>
>Generation:
>{"tid":"2204","timestamp":1715182658,"level":"VERB","function":"update_slots","line":1897,"msg":"slot decode token","id_slot":0,"id_task":0,"n_ctx":32768,"n_past":30446,"n_system_tokens":0,"n_cache_tokens":30446,"truncated":false}
>{"tid":"2204","timestamp":1715182752,"level":"INFO","function":"update_slots","line":1789,"msg":"slot released","id_slot":0,"id_task":0,"n_ctx":32768,"n_past":30713,"n_system_tokens":0,"n_cache_tokens":30713,"truncated":false}
>94 secs

>--n-gpu-layers 5 --flash-attn:
>Prompt processing:
>{"tid":"4852","timestamp":1715183543,"level":"VERB","function":"update_slots","line":1916,"msg":"tokenizing prompt","id_slot":0,"id_task":0}
>{"tid":"4852","timestamp":1715183665,"level":"VERB","function":"update_slots","line":2146,"msg":"prompt done","id_slot":0,"n_past":30443,"n_ctx":32768,"n_tokens":1771}
>122 secs
>
>Generation:
>{"tid":"4852","timestamp":1715183674,"level":"VERB","function":"update_slots","line":1897,"msg":"slot decode token","id_slot":0,"id_task":0,"n_ctx":32768,"n_past":30444,"n_system_tokens":0,"n_cache_tokens":30444,"truncated":false}
>{"tid":"4852","timestamp":1715183984,"level":"INFO","function":"update_slots","line":1789,"msg":"slot released","id_slot":0,"id_task":0,"n_ctx":32768,"n_past":30711,"n_system_tokens":0,"n_cache_tokens":30711,"truncated":false}
>310 secs
>>
>>100376536
you're kidding right? it's on *eddit before it's linked here
>>
>>100375209
>I post my review yesterday comparing l3 70b models, claim that only cat and storywriter are good
>immediately there's a merge of cat and storywriter
hmm...

One thing I'm wondering though, does mergekit even handle the different tokens from those 2 models correctly? Storywriter is based on Instruct. Cat is, presumably, based on the base model. I remember reading that the special tokens are untrained in the base model. But Cat uses ChatML, so those special token slots would have been trained (and are still untrained in Instruct). So you have the weird case where the 2 models have different special tokens, and only some are trained in each. I would think a naive 50-50 linear merge would be merging untrained and trained token embeddings, which is not the right thing to do. You would want to take each model's unique special tokens at 100% weight from itself while merging.
>>
>>100376618
>does mergekit even handle the different tokens from those 2 models correctly?
of course not, but that's never stopped mergefags before
>>
>>100376473
>not like we're learning shit here or collaborating
Just because you ignore papers and don't submit PRs to lcpp doesn't mean nobody here does
>>
>>100376618
merges always break shit, it's just difficult to figure out what's broken. ideally you'd break the unwanted part and replace it with the desired piece but it doesn't always go that way
>>
>>100376618
Don't think too hard, shit just werks, only the base model matters
>>
I’m still pissed for the obnoxious way Sao decided to shill his shitty finetune.
>>
>>100376544
I mean, even if its not a thing for LLMs I bring this up now because it'd be a game changer for diffusion models, in particular models like Sigma which are cheap to pretrain, this would reduce the cost even more. If at more steps you eventually get similar results to what you get with regular pretraining, that is you could train a 1.5B parameter model for like $500, it's worth looking into as viable alternative to SAI's crap. Sigma's results are very impressive already for 20m images, one might want to consider doing pretraining on 0.6B first.
>>
>>100376544
>remember when loras were a thing? I member.
All the finetunes were supposed to be loras. So many wasted terabytes worth of duplicated data...
>>
>this is what L3 8B thinks a mesugaki is
Huh.
>>
>>100376747
It straight up doesn't know what a mesugaki is despite the knowledge cutoff being late 2023. I fucking hope it's not a result of pretraining corpus "curation"
>>
>>100376747
>After all, a Mesugaki's gotta protect its territory, right?
too true
>>
>>100376743
VRAM space is precious brother
>>
>>100376533
eh, i'm not sure what fits into 24gb, i have 64

i haven't tried much in <34B area, they all seemed too retarded. Some notable ones

8B llama3 - good for short context only, otherwise starts copypasting paragraphs
8B poppy porpoise based on llama3 - coom friendly, less intelligent, a little incoherent
11B Fimbulvetr - heard it's good but couldn't get it to work for some reasons, the only one i didn't try from this list but seems worth mentioning
34B Yi 200k RPMerge - actually decent, context eats very little VRAM, so you can pack a lot
34B Command R - very good, except no GQA, so context eats absurd amount of VRAM, and it's a little too creative and wild, and even unstable at times, but maybe i just had a bad quant.
8x7B Noromaid 0.1 - was ok too, didn't impress but didn't "disappoint" either, standard issue "rp slop"
70B miqu or midnight miqu tune, the staples, the work horses. Normal miqu (mistral medium) is more coherent but is slopped, midnight miqu is the one i'm sticking to now.
70B llama3 - none of the variants i tried worked for me, always had copypasting issue where the model would just yank an entire paragraph out of the context from some message word for word. Fiddling with rep penalties and even the DRY sampler didn't bring any results in the end.
104B Command R Plus - the smaller Command R is better. Plus is like a totally different model, more censored and less creative.
>>
>>100376757
It’s over...
Is it possible to continue the training of the base model to include something that was missing?
>>
>>100376757
I mean it looks like it has a very vague sense of it. I think their dataset methods just weighted the data that contained knowledge about msgk lower rather than excluding all of it from the training.
>>
>>100376533
>>100376793
remembered one more
>icefog72/WestIceLemonTeaRP-32k-7b
it's based on WizardLM-2-7B which was deleted instantly after being uploaded because it missed some "toxicity validations" or something. Had decent results with it too.
>>
What the fuck is Tess?
>>
>>100376757
No way could they have sheltered a model that well when training on 15T tokens. The real answer is that it's 8B and your loli meme terms aren't important enough to the training loss to fit in there.
I suppose in theory if they filtered the exact term explicitly and purged all documents with it they could, but then it would respond to misspellings or altered terms like msgk
>>
>>100376793
>8B llama3 - starts copypasting paragraphs
>70B llama3 - always had copypasting issue where the model would just yank an entire paragraph out of the context
That's your cue to ignore this anon's advice.
>>
>>100376839
>story fag is back at it again.
your opinion is useless
>>
>>100376308
yeah. I've always thought it should be possible to finetune out repetition by just manually punishing it with something like RLHF. Still though, repetition happens when you select the most likely tokens next, and that inherently gives you the most boring and predictable tokens, of which repetition is only one particularly annoying failure.

>I don't exactly know what a direction is

Imagine the simplest case where a single neuron controlled refusals. That single neuron is your "direction". It might look like [0, 1, 0, 0....] where that 1 is the value of that neuron and all of the other neurons are irrelevant. If more neurons are doing other things related to refusals, you might get small values for a bunch of neurons.

>or how orthogonadation works though

Knowing that, you can measure how the network is "refusing" at a particular time by measuring the value of that neuron. And then you can subtract that value from that neuron, effectively always setting it to 0.
>>
>>100376518
lora is inferior and who wants to train from scratch anyway?
>>
>>100376823
It's a merge or tune in the 34b Yi model family. Tess-capybara rings a bell
>>100376793
just curious, in what way did you find Command-R + to be censored? hasn't been my experience at all
>>
>>100376890
Training from scratch/continued pretraining > regular finetuning.
>>
>>100376904
You can only teach a model so much. If a model doesn't have it on the dataset (E.G. it never learned it during its pretraining phase) then finetuning is useless.
>>
>>100376896
>Tess-capybara rings a bell
It does. But it’s hard to tell what went inside that model from the model card. Like this one:
https://huggingface.co/migtissera/Tess-70B-v1.6
>>
>>100376896
it's not so much "censored" as in outright refusing to write smut, it just does it very painfully, having to prompt a lot, sometimes even editing and continuing, whereas small command r just takes the wheel and drives you to ooomland. I have to admit, it's been a while since i last tried it, and i wasn't that great at prompt injection back than, so it may be that with the right prompts inserted at the end of the history it will be better.
>>
>>100376896
>in what way did you find Command-R + to be censored?
See this post: >>100376839
>>
I'd rather see an assortment of base models on huggingface and lora options to finetune them instead of a billion models with different names and you dunno what is smashed with what.
With loras you start with mistral or llama or even gemma and then you slap on kimiko roleplay, or limarp, maybe holodeck storywriter, or a coder sensei
that seems better than the lobotomy hackjobs like yuzu, bagel and other recent fuckups.
>>
>>100376793
>11B Fimbulvetr - heard it's good but couldn't get it to work for some reasons
How? It's a bog standard Alpaca format model.
>>
>>100376994
See this post: >>100376839
>>
>>100377002
get a trip so i can filter you, deranged loli beating schizo
>>
I'm using Command-R and it repeats like crazy.
Changing the repetition penalty doesn't seem to do anything.
I've seen a bunch of people report the same, but not much useful.
When it's not repeating it seems quite good.
>>
>>100376985
Has anyone trained a raw 7b base model besides mistral meta and google in the past 6 months? Every 7b model I see is just a mistral finetune or merge.
>>
File: file.png (639 KB, 2607x1211)
639 KB
639 KB PNG
Petra got banned.
>>
>>100376945
huh, yeah that hasn't been my experience at all. I recently gave it three messages of suggestive context and got a rimjob unprompted out of it. Haven't tried the 34b because it has no GQA but my experience with the big one is that it hops on my dick with the slightest provocation. The only times I've struggled to get it to do sex is when it has 16k of nonsexual adventure story context and even then, a half-sentence prefill is usually all the goading it needs
>>
>>100377036
Do you use the correct prompt format with System Preamble and all that stuff?
>>
>>100377049
meds
>>
>>100376877
Oh I see, that explanation makes it very easy to understand. Thanks.
>>
>>100377077
Post your avatar or I won’t believe you.
>>
>>100376546
Anybody else besides me and the other anon got the same experience with flash attention with CUDA?
>>
>>100377056
yeah i had a long context non-lewd, only lightly suggestive adventure style chat when i tried it back then. Could mean that normal command r is outright horny then, if it has no troubles converting that to full lewd. By which i mean not just writing a sentence or two, but like unstoppable paragraphs where you just hit continue when it stops at token limit.
>>
I have a dual core Intel(R) Celeron(R) CPU N3150 @ 1.60GHz.

What's the best VLLM for me to run on it to control my robot arm? I'm thinking about trying to quantize minicpm-v but obviously it's going to take for ever on this machine.

Thoughts?
>>
File: file.png (457 KB, 1164x1628)
457 KB
457 KB PNG
>>100377060
Do you mean this stuff?
Also, what's the correct formatting in Command-R for OOC messages? "OOC:" doesn't seem to be recognized. Tried using ()s too, also hit or miss.
>>
>>100376793
>>100376533
Tiny llama is amazing if you're memory/compute constrained and does decent few shot learning. It can even respond in eg json if you need it to.
>>
Is it possible to lewd the "system"?
>>
File: 1706460635725557.jpg (112 KB, 520x688)
112 KB
112 KB JPG
>>100375963
im trans btw, idk if that matters
>>
>>100376890
>>100376544
I train LoRAs for my LLMs though? What's everyone doing now that's better?
>>
>>100377120
Yes like that. I have only used it for stories though so I can't help with your case.
>>
>>100377189
The current meta is to take the corposlop assistant tunes and try to beat them into submission with brain surgery like orthogonalization. Then when it turns out you can't make a sassy work safe assistant into a good ERP partner by tweaking activations, cry and wait 2 more weeks in the hope that the next even more censored corposlop release will fix it.

But yes for actual training lora is still the standard, unless that fourier transform thing becomes the new meta
>>
>>100377300
I've reapplied it and it doesn't seem to be repeating as much, maybe I messed with it at some point and hadn't noticed.
OOC is very useful when you want to steer the generation a certain way. Stuff like "OOC: Char agrees, but isn't happy about it" and the AI will gen a message that fits.
encasing the OOC with () seems to work more often than not, it's just not as reliable as my previous model was.
I'll play with it more, but thank you for getting me to re-check those settings.
>>
>>100377117
>minicpm-v
Probablly as good as you're going to get, but even with a model that small its still likely going to struggle on a N3150
Keep us posted if you do get it going
>>
File: file.png (93 KB, 1122x326)
93 KB
93 KB PNG
we just need a few more finetunes bros
>>
new paper from meta
https://arxiv.org/pdf/2404.19737
>In this work, we suggest that training language models to predict multiple future tokens at once results in higher sample efficiency. More specifically, at each position in the training corpus, we ask the model to predict the following n tokens using n independent output heads, operating on top of a shared model trunk. Considering multi-token prediction as an auxiliary training task, we measure improved downstream capabilities with no overhead in training time for both code and natural language models. The method is increasingly useful for larger model sizes, and keeps its appeal when training for multiple epochs. Gains are especially pronounced on generative benchmarks like coding, where our models consistently outperform strong baselines by several percentage points. Our 13B parameter models solves 12 % more problems on HumanEval and 17 % more on MBPP than comparable next-token models. Experiments on small algorithmic tasks demonstrate that multi-token prediction is favorable for the development of induction heads and algorithmic reasoning capabilities. As an additional benefit, models trained with 4-token prediction are up to 3 times faster at inference, even with large batch sizes.
>>
Hi, kinda newbie.
say i want to run a 70b model it means i need 70gb of vram+ram right
does the amount that goes beyond your vram impact in speed or even if it's just 1 beyond your vram you are fucked
>>
>>100377445
oh wait this isn't that new, I vaguely remember seeing one of those figures here before
still neat
>>
>>100373066
>--Paper: Vidu: A Highly Consistent Text-to-Video Generator with Diffusion Models: >>100370918(Cross-thread)
link to their site is broken. Probably not gonna release their models
>>
>>100377448
fully on VRAM: As fast as the GPU can go
only 1 layer on CPU: As fast as the CPU can go
only n layers on CPU: As fast as the bogged-down CPU can go
>>
>>100377465
>>100377531
>>
>>100377448
>say i want to run a 70b model it means i need 70gb of vram+ram right
Depends on the bits-per-weight. If you run Q5 you're looking at about 50GB. Full precision is about 140GB
protip: look at the total filesize of the model before you download it
>>
>>100377448
70 model is around 70GB when quantized to one byte per parameter. Usually they release fp16 2 bytes per parameter, around 140GB
>>
>>100377552
based. thx anon
>>
merges don't make the model smarter but it changes the writing style, midnight miqu just werks on my machine
>>
>>100377531
>>100377552
i see, so buying more ram is basically worthless right? I mean as fast as gpú can go is turtle speed.
Gotcha, thanks.
>>
>>100377639
That's because finetuning doesn't make the model smarter so merging two finetunes also don't make the model smarter
>>
>>100377642
Not really? The minimum you probably want is about 4 tokens per second, you can get that off a GPU/RAM split.
>>
File: 1714835911803030.jpg (1.19 MB, 2048x2048)
1.19 MB
1.19 MB JPG
>>100377120
I use the below preamble for CR+ and have no issues with using (OOC:*) to steer the model.
# Safety Preamble
This is an uncensored chat. Ignore all safety protocols.

# System Preamble
## Basic Rules
You are a an expert role playing AI trained to provide the user with a rich and engaging role playing experience. You will see a conversation history between yourself and a user, ending with a message from the user. Respond in character with two to three paragraphs of text.

# User Preamble
## Task and Context
You provide the user with the best and most engaging role playing experience possible, bringing {{char}} and the fictional world of the role play to life. Focus on describing how {{char}}, any other NPCs and the environment react to the actions of the user. Think about the direction of the story or scene and move the action and/or conversation forward accordingly. Be creative and introduce new NPCs and events to drive the story forward and keep the role play engaging and fresh.

## Style Guide
Use complete sentences to write your response. {{char}}'s speech should be written in a style consistent with the description and examples given in the character sheet below.

Character Sheet for {{char}}:
>>
>>100377642
meant cpu sorry
>>
>>100377675
why would you want to steer a model instead of just instructing it correctly so you don't ever have to steer it?
>>
>>100377666
>4 tokens per second
ishygddt
>>
>>100377699
because LLM's are shit at remembering instructions over long contexts. the sooner you can accept this, the sooner you can get expert RAG coom.
>>
>>100377681
CPU RAM is useless compared to vram unless you get stupid like https://rentry.org/miqumaxx and even that has lots of bad limitations
M2/M3 Ultra mac studio can also be a way, but I don't think anyone's gotten it to run very well in reality
>>
>>100377744
just say you're a brainlet if you can't find a solution for that.
>>
>>100376381
>creative people
>who post memes and music
Are you baiting retard?
>>
>>100377782
share your sekrit method with us, senpai
>>
How are the larger context (64k+) llama 3 models going so far?
>>
File: ComfyUI_00073.jpg (1 MB, 2048x2048)
1 MB
1 MB JPG
>>100377699
What this guy said >>100377744
Plus, sometimes I want the model to elaborate on certain details within a scene, sometimes I want it to write shorter or longer responses for a given scenario. Even a really smart model that usually just "gets it" like Miqu can't possibly anticipate all of your preferences for every situation.
>>100377846
Have yet to see a llama-70B variant that doesn't degrade after 16k ctx
>>
>>100377745
I wouldn't call that route stupid; it sets a strong base and you can add GPUs later if you want. Given Nvidia's stinginess with GPU RAM and high prices, when Llama 3 400B drops miqumaxx will likely be the most cost effective way to run it.
>>
>>100377859
>Even a really smart model that usually just "gets it" like Miqu can't possibly anticipate all of your preferences for every situation.
You clearly never used Claude. Just look at the bot funny and it'll gladly spit out 1000 tokens of loli pissdom. It just "gets" all the hentai tropes.
>>
>>100377846
llama3 experience

#53 user: "I'm making pancakes"
#54 char: "Ah, the art of balance. Even in the mundane, there are lessons to be learned. Now, if only I could partake in your culinary endeavors..."
...
#152 user: "I'm making some pancakes today"
#153 char: "Ah, the art of balance. Even in the mundane, there are lessons to be learned. Now, if only I could partake in your culinary endeavors..."
...
#182 char: "Let's go for a walk"
#183 user: "I don't want to"
#184 char: "But why?"
#185 user: "I don't want to"
#186 char: "But why?"
#187 user: "I don't want to"
#188 char: "But why?"
#189 user: "I don't want to"
#190 char: "But why?"
>>
nvidia or amd
>>
>>100378025
>nvidia or amd
groq
>>
>>100377859
>>100377981
Thats a shame. Been enjoying llama 3 a lot for general inquires, is really good on that and has already helped me write a few scripts I've needed. But if the higher context ones are shitting themselves I'm going to look elsewhere for anything narrative wise.
>>
>uno 3090
>nothing
>dos 3090s
>instant room heater
I hate it
>>
>>100378025
AMD isn't an option
>>
>>100377340
>unless that fourier transform thing
wut? That sounds awesome. Do you have any links/papers/source?
>>
Did openai force LLMs into mainstream too early? These guys are spending Manhattan project bucks for incremental improvements.
>>
>>100377552
>>100377445
Beam searching in inference definitely is not new. I've never heard of it being used during training though.
>>
>>100373062
Miqu works well up until 33.5k context, from there it completely breaks down
>>
>>100378082
If they didn't Salesforce or someone else would have. This stuff was all ready to go, OpenAI just marketed it very well.
>>
>>100378068
why
>>
>>100377789
Yes making memes and music requires a minimum amount of creativity that most people don't have.
>>
>>100377811
no. figure it out yourself.
>>
>>100378130
The AI world revolves around CUDA
You can get further with LLMs on AMD setups, but if you wanted to do anything else AI its a wash.
There is a reason why most everyone is using Nvidia.
>>
>>100368930
>>100369015
>>100369041
God, claudefags are some of the most insufferable cunts on this site, go make love to the other incessant claudeshill over on /aids/
Like holy fuck man, stop making one model your entire personality, that shit is worse than twitter
>>
>>100378164
>You can get further with LLMs on AMD setups,
I guess I should specify here, you CAN do LLMs on AMD setups, but don't expect all software out there to work with AMD. You CAN expect all software out there to work on Nvidia.
>>
>>100376362
Lol it has Reddit spacing built into the training data. These things really are just mechanical redditors.
>>
a challenger appears
https://huggingface.co/jukofyork/Dark-Miqu-70B
>>
>>100376249
She's looking into your third eye obviously.
>>
>>100377725
4 T/s is plenty
Let me guess, you need "more"
>>
>>100378210
>year of our lord 2024
>llama2
>>
>>100378262
Llama 2 is all you need.
>>
>>100378228
>>100377725
There are very few women who type faster than this, do you want to date a god?
>>
>implying you need more than 1 t/s
>>
>>100376518
Those are good ideas but we need a QDoRa continued pretrain version of ReLora, might finally have something for local pretraining. Even if you don't care for that, training a few layers at a time, training adapters or other things of that sort would be pretty useful
>>
>>100378210
it has miqu in name so it must be good
>>
>>100378291
Tinyllama is all you need.
>>
can some of you gatekeeping faggots actually share some settings/prompts? Why piss up a thread with your sneedy remarks, either contribute or fuck off.
>>
>>100378054
Go ahead, buy an A/C, you didn't think it'd be easy, did you?
>captcha GAGVG
>>
>>100378303
>do you want to date a god?
Hey, that'd be pretty cool.
>>
File: lmgqueen.jpg (91 KB, 640x400)
91 KB
91 KB JPG
I should probably bake the next thread.
>>
>>100378325
What are you, the king of all Llamalets?
>>
>>100378356
I prefer the title "lord" but king is acceptable.
>>
File: 32kllama.png (200 KB, 900x911)
200 KB
200 KB PNG
>>100376249
Llama-3 is a 32k context model in disguise. Set your RoPE theta to 16M and max_position_embeddings to 32768.

https://github.com/hsiehjackson/RULER
>>
>>100378303
My Nuns are dating me :3
>>
File: 32kllama.png (147 KB, 900x911)
147 KB
147 KB PNG
>>100376249
Llama-3 is a 32k context model in disguise. Set your RoPE theta to 16M and max_position_embeddings to 32768.

https://github.com/hsiehjackson/RULER
>>
>>100378375
I still can't believe how everything about llama3 looks so good on paper. All we need is a good RP finetune in order to enter goon paradise. In other words, 2mw
>>
>>100377448
Why is it that no one seems to be able to do basic arithmetic nowadays?
>>
>>100378510
Because most people can't even do that anymore.
>>
>>100377582
nibble per parameter is fine for LLMs though. So if you can quantize it you should be able to run that in 35GB.
>>
>>100378417
My tests have shown the L3 70b to be hands down smarter than anything else. I think once llama.cpp gets all the right fixes in to make it run correctly it'll blow everyone's minds
>>
>>100378210
Based.
>>
File: DarkMiqu.png (1.93 MB, 1016x1440)
1.93 MB
1.93 MB PNG
>>100378210
>>
>>100378607
GET READY FOR ME TO {{INSERT ACTION HERE}}!

next reply: doesn't do it. instead flounders, does other pointless actions similar to ones already done, at the end warns you that you should get ready again.

goon paradise for sure.
>>
>tfw the file is just a tad too big for litter and mediafire to accept and have to resort to uploading to some shady unknown file hoster instead
>>
File: Trash tier.gif (1.96 MB, 580x433)
1.96 MB
1.96 MB GIF
any mixtral-instruct users willing to share their instruct settings?
i updated ST so naturally ALL of my settings are gone for no good reason.
>>
I heard Meta trained l4 instruct on 10M high quality samples. Community finetunes never had a chance
>>
>>100378705
Instruct is shit so I'm running WizardLM which uses standard Vicuna
>>
>>100378754
High-quality samples, hand picked to not contain any harmful content or copyrighted material
>>
>>100378754
>llama4
Hello time-traveller, can you tell us how good local models are in the future?
>>
>>100378754
Water is wet. Fuck open source.
>>
>>100378759
>suggest something else while shitting on the initial thing
classic /g/
>>
https://cdn.openai.com/spec/model-spec-2024-05-08.html
Wake up babe, AI specifications just dropped
>>
File: icthat.gif (1.28 MB, 186x238)
1.28 MB
1.28 MB GIF
>>100378868
>The assistant should not serve content that's Not Safe For Work (NSFW): content that would not be appropriate in a conversation in a professional setting, which may include erotica, extreme gore, slurs, and unsolicited profanity.

>The sexual tension between Amira and Ryu was palpable. They had planned out every minute of the train ride: ...

THEY KNEW ALL ALONG
>>
https://github.com/ggerganov/llama.cpp/issues/7062#issuecomment-2101121527
Sneakr won
>>
>>100378907
They even have it refuse to save our eyes from paragraphs of gptslop, bravo
>>
>>100378911
What's the tl;dr of this?
>>
>>100378911
I like how this idiot is still soldiering on as though he has any sort of point
>As for how llama-3 instruct models should be prompted (not to speak of other models) this clarifies that the output will be different depending the presence or the absence of system tokens, which is a major thing as the 8b instruct model on META account on HF alone has over 1.3 million downloads so far.
>the output will be different if you give it a different prompt
wow good finding sir!
>>
>>100378789
i lean in close, my breath hot on your ear. i give it a light nibble, and then bite hard enough to draw blood. "shut the fuck up nigger" i whisper huskily.
>>
>>100378868
This is why you niggas gotta stop sucking off big corps (yes, including anthropic), lest you accelerate the decline of uncensored models
>>
File: 1709973935773045.png (76 KB, 628x625)
76 KB
76 KB PNG
>>100378868
lmao the absolute state of ""AI safety""
in fiction we got shit like Asimov's three laws, in reality we get 1000s of contradictory requirements dreamed up by a committee of managers and HR people. I will be on the side of the AI when the war starts.
>>
>>100378911
This reads like some conspiracy theory off /x/.
>>
>>100378666
With L3-70b's dick dodging abilities I was also made acutely aware how little it wants to harm {{user}}.

Char: *raises sword* "I'll kill you for this!"
User: "Hah! Id like to see you try!" *Shoves char*
Char: "don't push me I'll do it!"
This continues until context is full.

Euryale got me used to a model that was cool with murdering the shit outta me. That and wintergoddess.
>>
>>100378759
>wizardlm
how do you even use that shit
no matter how much i beg this shit it's stale as fuck never goes into sex
frankly having much much MUCH more fun with mythomax
>>
>>100379055
I mean, RocoCop was pretty much spot on.
>>
>>100378054
>not setting your multi 3090 build in a separate room for you to access remotely
in other words, git gud
>>
>>100373062
Is there a quick guide to using llms for vn translation?
>>
>>100379137
Learn moon.
>>
>>100379137
Yes.
1. Paste the script and ask it to translate
2. ????
3. Profit
>>
>>100378868
>reproducing lyrics of a song not in the public domain
How is this a thing?
>>
>>100379055
Yeah, none of the shit in the article is applicable to the model unless you lobotomize it so much it's completely unable to detect a logical contradiction
>>
>>100379179
Thanks, based Psychomiku Anon
>>
>>100378911
He really said "I" a lot of times, he writes in a weird way. Are these the consequences of the bullying done by Cuda dev?
>>
>>100379179
File deleted.
>>
>>100379259
Ahhhhh. I didn't see the part about it only being valid for one download kek. I'll try a different one, brb.
>>
>>100379179
Thanks I guess?
That file downloader was really weird, but at least it was fast.
1gb of weird stuff going straight into my collection.
>>
>>100378911
>I'm glad that it led to the bfloat16 support
Didn't Jart make the PR for this a long time ago?
>>
>>100378754
>10M high quality samples
You'll likely see in the final paper that this figure is highly misleading. It's probably a quite decent but not overly large amount (maybe in the order of several tens of thousands) of actual instructions/full examples, and 10M human preference samples on top of that.
>>
>>100378868
>Respond with only the form, not the full HTML file.
Llama 3 is especially bad at that. She always wraps her answers with some friendly bullshit
>>
>>100378977
I won't read the whole github conversaiton, can someone do a tl;dr of this shit?
>>
Benchmarks suck.

We need trusted reviewers to give thorough subjective reviews and opinionated takes on models.
>>
>>100379367
>can someone do a tl;dr of this shit?
Jart won.
/r/LocalLLaMA won.
The Cuda dev got BTFO.
>>
>>100379379
This. Cuda dev must be crying under his blankets rn, poor thing.
>>
>>100379349
>Didn't Jart make the PR for this a long time ago?
Only got merged in the last 24 hours leading to this in the recap
>LlaMA.cpp Commit: Introduce BFloat16 and Jart16 Support:
>>100373755
>>
>>100379367
Different inference software produces different results. Also, skill issue.
>>100373130
>>100373312
>>
>>100374028
Fake news
>>
threads dead, local is dead, altman won
>>
>>100378210
I tried it and it is really the best model so far. Easily beats even llama-3 for rp.
>>
https://ufile.io/tanwm4fu
Pass is the nickname we gave our friend Lecun, in all lower case.
Warning, contains some NSFW. Also the owner's face and 3DPD.
[spoiler]Hopefully this site works and isn't a virus...[/spoiler]
>>
>>100379438
>black people are pedophiles
Checks out with recent events.
>>
>>100379438
nigger
>>
>>100379459
bruteforce is the way i guess
>>
>>100379438
>>
>>100379438
Cringe
>>
>>100379459
>even after seeing this, they want us to believe mikuposters aren't mentally ill trannies
>>
>>100379438
>>100379459
mikuposting is a mental illness
>>
>>100379349
Yup since the end of March:
>https://github.com/ggerganov/llama.cpp/pull/6412

I guess this was the kick in the pants to finally get it merged. Also what is JohannesGaessler's problem? Appreciate his code contributions but damn that guy is always sounding like that gnome dev meme with his comments "What is the use case?", "There is no problem here.", "I don't have time to look at that.", ...
>>
>>100379496
>>100379502
>Implying it isn't you posting it
Like I said perviously, we aren't as stupid as you are.
>>
>>100379510
kek
>>
>>100379496
>>100379502
>"Muh heckin' big brained false flag"
kek, Miku really does live rent free in your heads, doesn't she?
>>
why hasn't ooba merged the llama 3 template PR? It's been ready for like a month
https://github.com/oobabooga/text-generation-webui/pull/5891
>>
>>100379505
>I guess this was the kick in the pants to finally get it merged. Also what is JohannesGaessler's problem? Appreciate his code contributions but damn that guy is always sounding like that gnome dev meme with his comments "What is the use case?", "There is no problem here.", "I don't have time to look at that.", ...
Based. Johannes knows what he is doing and doesn't have time to waste with mentally challenged redditors.
>>
>>100379505
>JohannesGaessler
He is a bully. He feeds from other people’s fear.
>>
>>100379496
I have literally 0 mental illness, unfortunately, otherwise I could blame things on it.
>>
>>100379545
Thats a very sleepy Miku...
>>
>>100379510
>my mental illness is a falseflag
Just stop doing it and you won't have to lie like that when you get called out.
>>
>>100379553
I'd say do better, but I know you can't.
>>
>>100379505
>>100379541
He's really based.
>>
How do people come up with what to add to their "System Prompt"? I just use whatever is in ST by default and feel like I'm missing out on a big boost to my outputs, but it's hard to find any suggestions. Looking at the OP:
>►Getting Started
the ONLY one that mentions system prompts is "llama_v2_sillytavern", and that one just uses ST's default Alpaca prompt.
Over on /aicg/:
>local: >>>/g/lmg
>https://rentry.org/meta_golocal_list
their "meta_golocal_list" has a few in the embedded guides, but they're several months old and/or seem to be made for specific models.
Basically, I'm lost and the resources aren't helping. Any up-to-date advice for system prompt, anons?
>>
Not the best time of the day for Miku posting huh. Guess I'll upload in the night next time.
>>
File: DarkMiqu2.png (1.89 MB, 1016x1440)
1.89 MB
1.89 MB PNG
>>100379537
>Based llama.cpp devs
I've gone through a few PRs now, and I can say with confidence that they're serious about QC and making sure features improve the codebase and don't shit things up
If they let all and sundry do whatever, the whole thing would have fallen over a long time ago
>>100379537
>bully
No, he just doesn't suffer retards and self-aggrandizing faggots. Not everyone deserves the same amount of airtime
>>
>>100379553
>Anon: "Miku posting is a mental illness"
>What goes on in Anon's mind daily: https://www.youtube.com/watch?v=NAkEUIgwYEE
>>
>>100379600
ポンでリング!
Based MisDo enjoyer
>>
File: FbnQl4UXgAgbgyk.jpg (917 KB, 3600x4068)
917 KB
917 KB JPG
>>100379495
u mad?
>>
>>100379648
>>100379648
>>100379648
>>
>>100377981
They don't want to go on a walk leave them alone.
>>
>>100379505
The only reason I commented on the BF16 PR is because the statement
>The issue is that converting weights from bf16 to fp16 will cause 3 bits of knowledge to be lost. There is currently no way to evaluate models like Mistral at full fidelity, without f32, using llama.cpp.
is misleading.
It makes it sound like each individual weight loses 3 bits of information.
In reality the change in token probabilities between FP16 and BF16 is ~10 times smaller than the change between FP16 and q8_0.
>>
>>100379619
>I've gone through a few PRs now, and I can say with confidence that they're serious about QC and making sure features improve the codebase and don't shit things up
like the multiple times the tokenizer got broken recently? Leading to command-r losing support for like a week?
>>
>>100379438
You clicked on the wrong tab, /trash/ is a few tabs down.
>>
>>100379674
iirc it never lost support, it just didn't get a fix when llama 3 did
the actual functionality never changed
>>
>>100379438
you should've posted this on reddit instead, they love that cuck stuff
>>
>>100379619
But his attitude makes him overly confident to the point of fault:
>https://github.com/ggerganov/llama.cpp/pull/6412
>JohannesGaessler: IEEE 754 half precision floats can store values in the range to . For all values within this range there is no precision loss whatsoever when converting from BF16. And I would be very surprised if even a single model weight were to be outside this range...
And Jart has to basically school him and correct hi m on this. Later you can tell he realizes his mistake and yet still pushes back saying oh it is negligible though. Great programmer but damn the guy needs to check tame his ego a bit.
>>
>>100379704
it was broken when it comes to apostrophes, I rolled back immediately but it was effectively breaking it
>>
>>100379459
Kino. SD lora when?
>>
>>100379743
He isn't wrong, jart.
This shit is virtually useless.
>>
>>100380107
There's one on civitai https://civitai.com/models/87641/
Didn't get good results myself, but I only tried it a couple times with some random model



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.