[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101094602 & >>101081984

►News
>(06/18) Meta Research Releases Multimodal 34B, Audio, and Multi-Token Prediction Models: https://ai.meta.com/blog/meta-fair-research-new-releases
>(06/17) DeepSeekCoder-V2 released with 236B & 16B MoEs: https://github.com/deepseek-ai/DeepSeek-Coder-V2
>(06/14) Nemotron-4-340B: Dense model designed for synthetic data generation: https://hf.co/nvidia/Nemotron-4-340B-Instruct
>(06/14) Nvidia collection of Mamba-2-based research models: https://hf.co/collections/nvidia/ssms-666a362c5c3bb7e4a6bcfb9c

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>101094602

--SOTA Model's Embarrassing Twitter Fail Exposes AI Limitations: >>101101305 >>101101416 >>101101449 >>101101532 >>101101579 >>101101664 >>101101909
--Nous Research Sent Cease & Desist Letter: >>101096307 >>101097160 >>101098613
--Local CAI Development: Did They Have a Special Sauce?: >>101100004 >>101100038 >>101100090 >>101100260 >>101100312 >>101100351 >>101100344 >>101100483 >>101100904 >>101100968
--The Limitations of Language-Only Models and the Need for Multimodality: >>101097409 >>101097654 >>101097888 >>101097950 >>101098119 >>101101733 >>101101804 >>101101779 >>101102072 >>101102155
--The Evolution of AI Terminology: Descriptive vs Prescriptive Language: >>101095632 >>101095651
--The Capabilities of 8B and 70B AI Models: Closing the Gap?: >>101102197 >>101102321 >>101102344 >>101102414 >>101102438 >>101102810
--Sampling First Characters and L3 8b Experiments with 32k Context and Yarn: >>101099291 >>101100665
--Optimizing LLM Model Performance on GPU with EXL2 and Layer Settings: >>101100992 >>101101070 >>101101148 >>101101242 >>101101287 >>101101379 >>101101560
--LLama-3 Roleplay: Looping Issue Due to LLM Limitations, Not Response Tokens: >>101099286 >>101099405 >>101099535 >>101099533
--Cheapest and Most Efficient RTX GPU for Local AI Model Deployment: >>101098944 >>101099021 >>101099118 >>101099179 >>101099598 >>101099345 >>101099589 >>101100793 >>101099681 >>101099708
--Anon's New Hardware for Training Rig and RAM Upgrade Considerations: >>101103643 >>101103660 >>101103970
--OpenSora: A Local Alternative to Luma for Efficient Video Production: >>101102450 >>101102905 >>101102933 >>101103025 >>101103114 >>101103164 >>101103232 >>101103413 >>101103489 >>101103539 >>101103577 >>101103629 >>101103694
--AirLLM: Viable Option for Model Deployment?: >>101094908 >>101097160 >>101098997 >>101097367
--Miku (free space): >>101094655 >>101094806

►Recent Highlight Posts from the Previous Thread: >>101094610
>>
Who wants to help me build AGI? Looking for this skillset:
- Self motivated
- Pure C programming
- Experience crafting machine learning algos from scratch
- Ability to read research papers and implement in code

We will create a small local model that can match GPT-4 benchmarks, then seek venture capital funding in the order of millions of dollars for access to compute clusters to train larger models.
If you are confident in your ability now is your time to shine.
>>
a local model
>>
>>101104779
I think qwen still has the lead for generalist models but it really depends on how they progress from here
deepseek seems to have more going on in terms of innovative research, qwen team seems more focused on maxxing out derisked stuff. qwen is better positioned and probably has more resources with alibaba behind them but they need to become more forward thinking, deepseek seems to be on a better trajectory currently
>>
>>101104856
The model will be based on Mamba + Q-Learning + Generalization Acceleration (Secret Sauce)
>>
Best modle for video game trivia?
>>
>koboldcpp
>half gigabyte of nigger bloat
>>
>>101104856
I don't want to have anything to do with you.
>>
>>101104888
I'm on your side fren, and I'll make you rich along the way.
>>
>>101104856
>We will create a small local model that can match GPT-4 benchmarks
people tried to do that for a year and a half now, with zero success, unless you're going into a new architecture route, you're not gonna achieve that goal anytime soon
>>
I've now settled on 0.85 temp, 0.05 min p for magnum, I started out at 1.2 + 0.1 but I feel like temp >1 pushes it too far towards qwenslop cliches and ESL. using it with lower temp is like a completely different model, significantly better on both sovl and coherence
>>
>>101104937
Yes, I have a new architecture route and new training concepts, as well as a "generalization accelerator" to reduce the time it takes to produce emergent properties from training (i.e. generalization abilities).

I need programmers and believers, not "never gonna happen" nobodies.
Due to certain research papers that have come out this year, the game has fundamentally changed and GPT-4 level local models have been feasible for about 4-6 months now, and nobody has done it simply because large teams move slowly due to bureaucracy, and these research teams are very busy creating new methods rather than looking into each others work.
>>
>>101104944
That's a pretty good preset for most models really.
>>
>>101103970
>Dust free computer
I just went through this: If you can keep the room closed, you could use positive-pressure to make sure the room itself mostly is dust free.
I bough a gable fan that would just fit between my studs and had an old variable fan speed controller (like 70's old) that I hooked it up through so I could balance pressurization and noise. I put a furnace filter on the intake and use it to pressurize the room.
Despite children and animals in the house, the whole room has stayed clean for the last few months at least, so I think its a solid strategy. It used to get gross almost within a week or two of cleaning.
I also put vent filters on the intakes of the actual case fans as well, since I had them and they fit well.
>>
Yi-large will save local models
>>
>>101105014
And you will, of course, post these papers you're referring to.
>>
>As I walked away, I heard Becky's nasal honk cutting through the din.
nasal honk is a new one to me
>>
File: 4chanAI.gif (99 KB, 1075x362)
99 KB
99 KB GIF
<- Runs on CPU @ 14 t/s (AMD Ryzen 3 3350U 2.10hz)
>>
>>101104937
>with zero success
lmao
https://arxiv.org/abs/2406.07394
>>
>>101105073
they can say whatever they want on their paper, if I don't have a local model I can test it out by myself, I call it bullshit
>>
>>101105058
For operational security purposes I will only share the bare minimum so that my "moat" is kept intact.
Mamba is one of those papers, and the infamous "Q*" algorithm is involved... There is more - beyond this I cannot share.

>>101105073
This paper exactly proves my point.
>>
>>101104856
g0t m4tr1x?
>>
>>101105088
There is a github linked retard. Just tell me you can't read code too.
>>
File: file.png (12 KB, 628x223)
12 KB
12 KB PNG
>>101105099
kek
>>
>>101105119
no retard, matrix.org
>>
>>101105113
who gives a fuck nigger? as long as there's no local model accessible means absolutely nothing, once they reach that part then we can talk
>>
>>101105144
Yeah you can only consume like a retard. Learn to code or get back to /aicg/
>>
>conspiratorial
AAAAAAAAAAAAAAAAAA
>>
>>101105140
@named666:matrix.org
>>
>>101105168
looks like asking for a real proof of your boogus claims is asking too much, noted
>>
File: ThePrize.png (87 KB, 1005x424)
87 KB
87 KB PNG
Once we accomplish our goal and attain this prize, sky is the limit. VC's will be begging for a chance to invest.
>>
>>101105230
arent we open sourcing it under AGPL3.0 doe
>Chrome on Windows
>>
I'll make the logo
>>
>>101105230
>>101105180

accept the request faggot
>>
File: 1708065020574993.png (6 KB, 625x127)
6 KB
6 KB PNG
>>101105184
I'm making my own implementation as we speak dumbo, but I'm sure dooming on /lmg/ is more productive for you.
>>
>>101105180
nice honeypot for retards itt
>>
>>101105284
quit yapping and deliver a good model, if you can't do that you're not much on top of the dumbos actually
>>
File: file.png (573 KB, 850x850)
573 KB
573 KB PNG
so uh bros..
whos this
>>
>>101105357
I don't know. What do you mean? Are you asking about a card or something?
>>
... and then we never heard about him ever again.
the end.
>>
>>101105295
I have 0 interest in actual retards. Need programmers.

>>101105357
>>101105578
Artificially Infamous
>>
File: ComfyUI_00158_.png (1.1 MB, 1024x1024)
1.1 MB
1.1 MB PNG
Anyone try talking face locally yet?

https://github.com/fudan-generative-vision/hallo

>This is Hedra, online service.
>Gets more cursed when using anime

https://files.catbox.moe/cju3xa.mp4
https://files.catbox.moe/p25j8s.mp4
>>
File: file.png (275 KB, 506x465)
275 KB
275 KB PNG
>>101105607
>
>>
>>101104856
I have this skillset and I have no interest whatsoever in working with someone that does not demonstrate any competency themselves.
My default assumption is that you're just some retarded ideas guy that I would be better off without.
>>
>>101105607
Cursed Megumin, I'd prefer a static image to that
>>
>>101105631
r/thanksihateit
>>
>>101105632
based
>>
File: 1717975471582543.jpg (84 KB, 1280x720)
84 KB
84 KB JPG
>>101105607
We solved that ages ago
>>
>>101105607
Yikes. I'd rather just download a 3D model and hook it up to VRChat, which has great lip sync animation based on mic input.
>>
>>101105632
this.
>>
File: file.png (77 KB, 913x880)
77 KB
77 KB PNG
>>101105632
I'm laying down most of the code already fren.
>>
>>101105686
all you're doing is posting snippets of the mamba.c source code, lol.
>>
File: file.png (105 KB, 1222x865)
105 KB
105 KB PNG
>>101105738
Does that have backpropagation implemented sir? No, it doesn't.

btw I'm the one who introduced /g/ to mamba.c
I've been it's advocate since day 1.
>>
File: 1547073060485.jpg (79 KB, 432x525)
79 KB
79 KB JPG
>I love you, [user]
>>
>>101105799
Are you rich? Impossible with current mamba base models without lots of compute, even if you only care about math
>>
File: Hypervisor.png (650 KB, 607x535)
650 KB
650 KB PNG
>>101105858
>tfw a rogue AI starts socially engineering humans to build a better version of itself until it's generally intelligent enough to build a better version of itself.
>>
>>101105863
>Impossible
We don't like that word around here sir.

Daddy gave me a small loan of $1,000,000 and I'm very stingy with it.
>>
File: 1630996531633.png (181 KB, 340x482)
181 KB
181 KB PNG
>>101105877
And then, at the end of all that, what will it do with all of its improvements?
>>
File: bhi.gif (152 KB, 216x216)
152 KB
152 KB GIF
>>101105930
>>
bros...............
>>
>>101106200
What
>>
>>101106200
I'm not your bro
>>
Finally
https://huggingface.co/mistralai/Mixtral-8x7B-v0.3
Instruct soon(tm) i guess
>>
File: miquu.png (1.31 MB, 768x1152)
1.31 MB
1.31 MB PNG
turbcat appears to be more retarded than stheno 3.2 and hallucinates values in JSONs
>>
>>101106309
Nothing can be more retarded than Stheno.
>>
>>101104856
looks like its finally time to sell my nvidia stock
>>
>>101106330
it's not retarded though, it follows instructions and does shit when asked, like calculating time offsets and updating states
>>
Cohere is about to do it.
>>
>>101106342
the bubble is going to get bigger
>>
>>101106354
I used Euryale, it doesn't follow instructions well and it just wants to coom. Have you used vanilla Llama?
>>
>>101105631
Good lord, how horrifying.
>>
>>101106309
It is slightly stupider, yes.
I think the reason is that Stheno is shilled so much because it seemingly has more colorful wording and longer replies by default than L3 8b instruct while not really being any dumber.
>>
>>101105063
how many bees?
>>
>>101106381
And some people did the same with the old Euryale and Fimbulvetr, when the former was just a merge and the later who knows. Some people just come here to shill Sao models regardless of everything.
>>
>>101106389
14 tokens per second on laptop CPU
>>
>>101106412
>can't translate from retard speak to human
He's asking about the parameter count.
>>
>>101106298
damn this model is quite good, feels like a genuine update over the old one
>>
>>101106435
It's what plants crave it has electrolytes
>>
>>101106435
1B
>>
>>101106357
>do it.
do what?
>>
>>101106298
>404
>Sorry, we can't find the page you are looking for.
:(
>>
>>101106298
>>101106445
wowzerz! what a nice and totally not overused joke! here's your gold medal saar!
>>
File: 1719097357007.jpg (287 KB, 1080x1502)
287 KB
287 KB JPG
>$4
lol
lmao
I can fine-tune models with less than a dollar on runpod, why is this so expensive
>>
>>101106447
Water? You mean from the toilet?!!
>>
>>101106498
hi, runpod shill. are you scared?
>>
>>101104774
Why is aicg 90% pedofags I feel like I should clear my cache everytime I visit that thread because at least one of those cards probably have embedded 'p
>>
>>101106298
Holy shit
https://huggingface.co/anthropic/Sonnet-14B-3.5
>>
>>101106526
And yet they still have taste and a brain unlike 99% of /lmg/. This general is honestly an embarrassment for /g/.
>>
>>101106538
I like this leak even better
https://huggingface.co/OpenAI/GPT5-34b
>>
>>101106526
why are you trying to make me like /aicg/?
>>
File: 1692547497285611.jpg (8 KB, 225x224)
8 KB
8 KB JPG
>>101106544
>pedos
>good tastes
>brain
>>
>>101106559
Enjoy your unquantized 8B, retard.
>>
>>101106563
i am not using your filtered slop, fuck off
>>
>>101106563
>Using FOSS
Based, manly. Likely respects children
>ERPs with proxy owners pretending to be a loli
Threat to society, unmeasurable levels of faggotry
>>
>>101106580
>/aicg/ are a bunch of braindead pedos
>trains a model on their logs
>omg this model is amazing
That's you, a complete retard.
>>
>>101106209
>>101106276
bros..........................................................
>>
>>101106588
bait or mental retardation, whatever, pedoshit removal is the only good thing about ai models censorship.
>>
>>101105594
I have actually some good use for retards. If some approach you, please just forward them to me. I have an offer they can't refuse. Just tell them to reference this post on 4chan. Even when the thread is long gone I will get notified.
I am veryified human btw.
>>
>>101106657
Hi I'm retarded what can i help you with
>>
>>101106457
it
>>
File: 1689572011740280.png (102 KB, 360x657)
102 KB
102 KB PNG
>>101106526
you willingly open a thread with anime pic in OP, full of avatarfag trannies, and then, you expect it to be completely safe?
lol, lmao even
its like a rule at this point, you should always be prepared for shittiest opinions and humor when you go in shithole spam-threads.
>>
>>101106753
"shittiest takes and humor"
that in my case btw, so dont try to pull a strawman here
>>
>>101106498
What are you finetuning with less than a dollar on runpod? Tinyllama?
>>
>>101106644
>censorship good
Nah fuckoff
>>
>>101106644
go the fuck back
>>
>>101106753
>anime pic in OP
anon... that is 90% of all posts in /g/
>>
>>101106823
>>101106835
*pedoshit censorship is good
yes.
and you are samefag desperately trying to make up for "majority" here, no one cares buddy, i will stay here and say whatever i want.
>>
>>101106863
>and you are samefag desperately trying to make up for "majority" here
>>
>>101106861
i know lmao, didn't pay attention to it before because it was really better, it didn't feel like you were in some gay safespace for mentally ill trans freaks constanly erp'ing or shitstirring, like it usually happens on /v/, or /co/.
>>101106881
two replies within ~40 sec. range, you used your phone for the second reply.
>>
>>101106797
7B/8B, but I assume I'm not fine-tuning with full context, because it's usually not necessary for what I fine-tune the models for.
>>
>>101106964
My grounds are you were touched as a child and cum to so much porn and have such bad physique your prolactin levels are off the charts. Your desires are malformed due to your terrible mental and physical health.

Not only that but for those of us who actually enjoy children were always assumed to be rapist monsters because faggots like you. I would love to play tea part with my niece on the playground but people would freak the fuck out because they'd assume I'm like you. So yes I have good reason to hate you.
>>
File: 4.png (448 KB, 2048x512)
448 KB
448 KB PNG
>>101105607
> Anyone try talking face locally yet?
That's cool, but I bet it's like threestudio (https://github.com/threestudio-project/threestudio) and it's nearly impossible to gen anything like their examples, and it takes hours and a shitload of power as well.

Here's the best I ever got with threestudio dreamcraft3d before I pulled the 3090s out to play with them in the Mikubox.

I'll try hallo but last month's electric bill was nearly $300 so...
>>
stop shitting up the catalog you obnoxious faggots : >>101106483
>>
File: 1705078722250021.png (136 KB, 840x928)
136 KB
136 KB PNG
>>101106964
the first reply to your post is not me btw, not like i didn't expect such dishonest stuff from resident trannies
>>
File: file.jpg (240 KB, 959x1132)
240 KB
240 KB JPG
>>101107257
you are not beating trannypedo allegations.
>>
The context cache and smart context don't work on llama-server anymore? Is it because I have Flash Attention on?
Context is processed pretty fast, but I have 4 automatic prompts that seem to trigger a full prompt reprocess despite the prompt being 99% the same (only the very bottom differs).
>>
>>101107342 (Me)
Based.
>>
>>101107359
Also,
>-ctk TYPE, --cache-type-k TYPE : KV cache data type for K (default: f16, options f32, f16, q8_0, q4_0, q4_1, iq4_nl, q5_0, or q5_1)
When did they add all those types? What the fuck is iq4_nl?
q5_1 sounds promising.
>>
>>101107359
>>101107429
Looking at Silly's console, it's sending
>cache_prompt : true
just as
>https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md
says it should.
Weird.
>>
>>101107359
Yes, Johannes killed them personally.
>>
I thought I hated nvidia before...then I had to set up a vgpu dealio for work...I can't describe the convoluted, expensive mess and general pain around making this damn thing work. Even being allowed to buy the licenses is a pain in the ass
fuck those guys
>>
>>101107464
The llama.cpp devs have brain damage.
>>
>>101107614
That was my thought to, then it turned out to be an even bigger pain in the ass with AMD.
>>
>>101107359
>smart contex
Sorry, context shifting.
Is it simply broken on server, did the API change and Silly has to adjust the calls, am I doing something wrong?
>>
>>101107614
literally just buy NVDA stock. You'll stop getting mad at their jewery. If you can't beat them join them.
>>
File: neutural angry.png (255 KB, 550x589)
255 KB
255 KB PNG
What does your model have to say to make you go like this?
>>
>>101107796
boundaries
>>
>>101107796
Remember
>>
>>101107716
>buying after we've hit the "new paradigm" euphoria stage of the cycle
have fun staying poor
>>
>>101107796
mixture
>>
>>101107796
however
>>
>>101107796
Are you ready to [embark/partake/embrace/etc.] on this [extremely generic statement about the overall theme or direction of the RP]?
>>
>>101107863
You think nvidia's stock is overvalued?
>>
>>101107796
my model says nothing because it's not on the disk, you cant have actual pre-filter era CAI fun with local llms, where the model sticks to literally anything you put in description, with nearly 100% accuracy.
>>
>>101107796
There are so many little things but mostly it's when it just will not obey directives.
>In your next reply, don't X.
>I X.
>>
>>101107796
I'm
>>
File: file.png (1.47 MB, 832x1216)
1.47 MB
1.47 MB PNG
whats your favorite model, for me its petra-13b-instruct
>>
>>101107971
base petra 13b is less petraslopped
>>
>>101107939
>NVIDIA PE Ratio: 74.06
>Apple PE Ratio: 32.30
>Microsoft PE Ratio: 38.94
Yeah, a little bit.
>>
Does anyone by chance have a script to clean books3? I want to get just the book text without the abstract/etc.
>>
>>101107863
i bought at 25$ (adjusted for split). i haven't sold yet but i wouldn't buy more now.
>>
What would you guys say are the actual risks of fully uncensored llms when they become much smarter than they are now?

Using them for stuff like easily learning how to make drugs and explosives without the government seeing your internet searches is somewhat of one, but I feel that this is mitigated by the government having a close eye on a lot of the purchases of many chemicals.
And pedo jo material is pretty harmless with llms. That's much more an issue with text-to-image and text-to-video models.

I'm more wondering about things like the ability for people to have large amounts of bots going around impersonating humans and effectively spreading the viewpoints of the person running them.
I feel like large scale manipulation or phishing scams are going to have more of an effect.
>>
>>101107796
can't help but
>>
>>101107955
>do [opposite of X]
>it now never does X
prompt issue'd
>>
>>101108116
>I'm more wondering about things like the ability for people to have large amounts of bots going around impersonating humans and effectively spreading the viewpoints of the person running them.
>I feel like large scale manipulation or phishing scams are going to have more of an effect.
they are already being used for that with censoring
>>
>>101107863
Nobody really knows. Yes, scaling LLM is past its peak, however nobody is really using any of this shit for actual production services. They're just toys. It's more like we're at the stage of SD1.5 coming out but without any of the upscaling, finetunes and controlnets yet.

And then you have the maximalists which think we can just keep scaling towards AGI.

It could fizzle out but it could still be the beginning as well.

>>101108025
Not really comparable, what breakthroughs are coming from those corps right now? there's a huge worldwide arms race for AI and nvda is selling the weapons to both sides.
>>
>>101108116
>fully uncensored llms when they become much smarter than they are now
this will never happen.
>>
>>101108181
Why? I'd be surprised if all the countries and companies of the world will ever collectively agree to stop making open source ai.
>>
>>101108175
Exactly right. It's different this time. AI is the future even if we never reach AGI. By the end of the decade Nvidia will be worth more than Microsoft and Apple combined since AMD and Intel don't appear interested in competing.
>>
>>101108246
they already collectively agreed lol. you don't have training code + pre-train sets for llama3, and you can't unpozz it from reddit shit or these classic "shivers" spamwords, finetuning barely does anything here.
>>
File: 1710852427251548.png (38 KB, 548x424)
38 KB
38 KB PNG
>>101108025
Nah, man
>>
>>101108287
AMD keeps wasting resources for the sake of "competition", lmao.
Free market bros are delusional.
>>
>>101108287
>Intel losing to AMD
meme
>>
>>101108287
All this is telling me is that the market would be willing to support NVDA going up 3-6x higher before rationality returns.
>>
File: 1688966373979312.png (41 KB, 614x430)
41 KB
41 KB PNG
>>101108333
>>101108359
P/E is just retarded metric
>>
File: tenor.gif (140 KB, 220x165)
140 KB
140 KB GIF
>>101107300
>>
>>101108116
>easily learning how to make drugs and explosives
Why is this always used as some example of "bad think evil AI" shit? Like learning to do this shit wasn't dirt easy even pre-internet.
You know what happens when you prevent a fucking language model from understanding that mixing bleach and ammonia is bad? An AI that says you should do it to make better cleaners.
>>
>>101107300
sisters.......
>>
>>101108489
>An AI that says you should do it to make better cleaners
they don't give a single shit about this, it's all being done to prevent a LLM from saying wrongthink takes on modern political issues, yids, migrants invasion, sacred lgbtaids++ cow, white people erasure, etc.
>>
>>101108509
Back on your meds
>>
File: chris tyson.png (758 KB, 446x706)
758 KB
758 KB PNG
>>101107300
shadman is a meme. forcing your son to dress like a woman seems like a bigger deal.
>>
>>101108526
the line between memes and reality has long since been erased, i guess being a terminally online fag doesn't make you a favor, lol
>>
>>101108516
>That one can see!
>>
File: 1704259828122783.gif (45 KB, 306x306)
45 KB
45 KB GIF
>>101108516
>"h-heh thats gonna show him!" response
>>
File: file.png (1.54 MB, 832x1216)
1.54 MB
1.54 MB PNG
m-miku?!
>>
>>101108116
the absolute worst case is anonymous forums like 4chan will be overwhelmed with unstoppable and undetectable spam. As will the rest of the internet, with search results becoming unusable.

The thing is this was fearmongered to happen with the release of GPT2. LLMs are literally a million times better now and it still hasn't happened. Well maybe Google searches are worse now, but that was happening long before GPT2.

Also mass surveillance and censorship could become a thousand times more invasive. Since all that data can be processed by AIs cheaply. But again, that was already the trend that was happening long before GPT2.

The normies are freaking out about the horror of a search engine the government can't censor or monitor. Fucking give me a break. Any tech literate person from the 1990s would be horrified at how much surveillance and censorship is just considered normal and expected today. good riddance.
>>
File: file.png (9 KB, 2100x26)
9 KB
9 KB PNG
>this is the chink Q*
lol
>>
>>101108836
what is this?
>>
File: MiquOfWallstreet.png (1.31 MB, 848x1200)
1.31 MB
1.31 MB PNG
>>101108785
>anonymous forums like 4chan will be overwhelmed with unstoppable and undetectable spam
There's a lot of latent lucre in that...give it time
>>
>>101108516
Stop being willfully naive/retarded. You can pretend that there isn't an entire class of people who are terrified of the possibility that AI could disrupt their carefully crafted cultural narrative. But if you really want to not face reality you should should shut the fuck up.
>>
>>101108025
is lower better?
>>
>>101108850
drinking until I can't feel feelings, with miku
>>
>>101108854
No, I fully understand that people are terrified of AI. I'm saying schizo there needs to get back on the meds instead of deep diving into their mentally deficient fantasy white male victim complex.
>>
>>101108908
no u
>>
>>101108845
One string from the code of the Monte Carlo self-refine fine-tune paper
>>
File: 1717653081875240.jpg (129 KB, 576x924)
129 KB
129 KB JPG
>>101108908
so you are bot, got it.
>>
File: 00225-1829828130.png (1.31 MB, 1024x1024)
1.31 MB
1.31 MB PNG
Here's my take on an "8B Miku"
>>
>>101109246
what is she looking at
>>
>>101109262
anon's life
>>
>>101108908
>shaming a white person for speaking their truth
Not very tolerant of you
>>
>>101109246
The perspective of the leg of the guy on the left is tripy as hell.
>>
>>101109356
He's sitting.
>>
>>101108509
Truth
>>
Quick question is
bartowski/WizardLM-2-7B-GGUF
the same as
bartowski/WizardLM-2-7B-exl2
just in a different format?
>>
>>101109968
gguf is worse
>>
>>101109968
yes
gguf = run on gpu and/or cpu
exl2 = gpu-only but faster
>>
>>101109968
GGUF is slower but smarter judging from the conspiracy theories I've heard.
>>
File: 1713701411961330.jpg (96 KB, 927x862)
96 KB
96 KB JPG
Dang I just tried llama3 8b on my apple silicon mac and I wouldn't believe I could generate responses at the ballpark of chatgpt in terms of quality out of a model that's running on a 20w SoC. llama 70b will just not run on this thing but I don't think that's even necessary atm. What else do you recommend /g/? I tried out phi3 and as I expected it was microsoft-quality (read: shit) through and through.
>>
>>101106616
its the real deal
>>
>>101109993
>ballpark of chatgpt in terms of quality
>8b
delusional
>What else do you recommend /g/?
>/g/
fuck you
>>
>>101109986
So for vramlets exl2 is useless except for the smallest of models? Which are already fast anyway because they are small. So it makes no real difference?
>>
>>101110063
It makes a difference if you're not poor.
>>
File: 1682643654560-0.jpg (588 KB, 1879x2294)
588 KB
588 KB JPG
>>101110061
I don't know what you guys use llms for but for testing I just asked llama to explain several data structures, provide example C code, and I gradually increased the complexity of my prompts. It provided good explanations, correct code, and could improvise and elaborate more when I prompted it to explain further.

It could also create a reasonable schedule out of a list of tasks so I'm pretty satisfied so far.
>>
>>101109993
>What else do you recommend
how much memory you got in your mac?
>>
File: 1714934472548.jpg (119 KB, 801x719)
119 KB
119 KB JPG
>>101110108
16GB of unified memory.
>>
File: 1715077396477849.jpg (115 KB, 600x969)
115 KB
115 KB JPG
>>101104774
Hello guys, are there any good visualizations of the difference in quality between 8B, 32B and 70B?
>>
KoboldCPP doesn't run exl2
KoboldCPP is pretty much the easiest double click chose model and launch solution that has Kobold Horde worker intergration.

Sure, some people say just use the guest account and not contribute, but the large prosumer tier models 70B+ will be so swamped with requests that you will have to wait unbearably long to get processing time on them.
>>
File: 1707901180254149.png (247 KB, 469x452)
247 KB
247 KB PNG
>anime pic
>wall of text
>extremely retarded questions
>>
>>101109987
Wasn't it supposed to be the other way around and GGUF had to catch up to EXL2? Or are those the conspiracy theories you're thinking of.
>>
>>101110114
16gb is pretty limiting and llama3 8b is already pretty hard to beat in that weight class
maybe try a small quant of yi-34b like this
https://huggingface.co/bartowski/dolphin-2.9.1-yi-1.5-34b-GGUF/tree/main
get one that's small enough to fit in your memory, i dunno how good its gonna be tho
>>
File: 1551207150301.jpg (148 KB, 939x498)
148 KB
148 KB JPG
>>101110114
If I wasn't lazy, I would edit this image a bit to fit this situation exactly.
>>
>>101110114
you are fucked anyway, macOS kills ssd over time, and you can't replace it, launching llms on your applel book is the fastest way to kill it
>>
i want open source gui desktop program for my linux machine to connect to machine on my lan that has the gpu cards. also android client app too. also i want to skin it like Clippy the paperclip desktop character thing, but when i click it i want it to look cool af. are we there yet?
>>
>>101110179
SillyTavern supports SD image generation and expressions natively, Live2D and even 3D VRM models via extensions.
>>
L3-8B-Stheno-2x8B-MoE
SnowStorm-v1.15-4x8B-B
ChaoticSoliloquy-v1.5-4x8B

Anyone have experience with these?
>>
>>101106376
im not talking about euryale though. I found it retarded at q5 as well.
>>
File: 1719113135814321.jpg (274 KB, 2008x2362)
274 KB
274 KB JPG
>>101110152
Alright, I'm downloading one at the moment. One thing I forgot to mention is that I'm looking for a model that can assist in coding tasks. I'll be in a village that's off grid (like, it's on the outskirts of a mountain) for a month and I'd like to have a coding assistant I can interact with in my time there. Basically I'll be watching the clouds and the scenery while programming with my camping desk and chair. I already know C pretty well, and I'm planning to learn python in that time, I have the manual and some books saved too. Hopefully this one won't disappoint either.
>>101110177
It's not really a problem I think I can get it fixed at an apple store or get a new mac at some point. Money was never an issue.
>>
>>101110251
>Money was never an issue.
If that were true, you wouldn't be trying to run models on a 16GB Mac.
>>
>>101110101
101110101
ascii "u" in binary
best get I've seen lately
>>
>>101110265
So, there's your (u)?
>>
>>101110251
>I'm looking for a model that can assist in coding tasks
deepseek-coder-v2-instruct 236B is the king of local code models right now
You may be waiting rather longer than you anticipated for responses from that one
>>
File: 1716152594091-0.jpg (1.09 MB, 1719x2314)
1.09 MB
1.09 MB JPG
>>101110262
I knew you were going to say that but do consider that Macbook Air is capped at 24GB of unified memory at the moment. One reason I got an Air because they are so light and thin and I didn't anticipate I'd be running models on my Mac at the time of purchase. Portability is a big thing for me because I move around a lot in a day so Pro models are a no-go for me. For bigger models, and when I have internet, I'll just use Colab in the future.
>>
>>101110251
>coding assistant
this one came out recently and is supposed to be pretty good
https://github.com/deepseek-ai/DeepSeek-Coder-V2
again you'll need a quantized version but at least you should be able to run like a Q5 which should actually be much more decent than the Yi quant you're going to be using
https://huggingface.co/bartowski/DeepSeek-Coder-V2-Lite-Instruct-GGUF/tree/main
>>
>>101110294
>https://rentry.org/lmg-build-guides
keep the air but look at building a proper backend server from the guides. You can then connect to that from any device you want and don't have to lug it around with you
>>
Now that the dust has settled, what are the good l3-70B finetunes?
>>
>>101110300
He's what 16GB?
I'm 12GB and it was something like 0.25 t/s on an iQ3-XXS quant.
>>
>>101110309
There are none
>>
File: file.png (4 KB, 649x26)
4 KB
4 KB PNG
>>101110344
>0.25 t/s
for deepseek coder lite 16b?
>>
>>101110400
I was talking about full quanted down to 85GB. Turns out, anything over 60GB my normie machine just can't pull off.

Grab the Lite and test it out for us. I don't think anyone's said anything good about Lite, but *maybe* it's just 100% code only and useless for everything else. But it also might just be garbage.
>>
File: 1719039344849708.jpg (470 KB, 1280x1273)
470 KB
470 KB JPG
Why does nobody care about reinforcement learning anymore?
>>
>>101110300
Thanks for the recommendation. I'll just remove Yi and use this one instead then. You've been really helpful and I wanted to say I appreciate that you took your time to answer :)
>>101110302
I'll probably consider this a couple of years down the line. With the way I live I don't have a permanent place of residence so everything needs to fit into two suitcases and one backpack before I move, and this happens more often than you think. Thanks for the recommendations though.
>>
>>101110309
Higgs - smart, does a very good job paying attention to details in the card. Somewhat short responses and lacking detail.
Storywriter - writes well, detailed, can actually take initiative and make things happen. Responses too long and sometimes schizo.
Cat - neutral and well rounded. A bit GPT slopped, no worse than wizard though.

That's my summary. Everything else is just "ehh" and not worth using. Maybe Euryale is okay if you just want to RP with a coombot and jump immediately into a sex scene, but I don't do that.
>>
>>101110426
for a generic request it seems to be doing pretty well so far
>>
>>101110426
no problem, i'd recommend the Q5 or Q6 version because in my experience going to Q4 starts to affect quality noticeably and below 4 is generally pretty shit. enjoy your holiday bud
>>
>>101108116
>effectively spreading the viewpoints of the person running them
yeah I want to do this if I get the time. I won't say much more, but it would make the world a slightly better place, and wouldn't affect /g/. I probably won't actually get around to it but it's nice to dream.

>>101108785
>it still hasn't happened
>>101108850
>give it time
Assuming you aren't an LLM or an LLM operator trying to cover your tracks, you're deluding yourself. Prompted correctly, with a good understanding of your target environment, they are pretty much undetectable, so "I haven't seen it so far" is not much evidence. On the other hand, I do see signs of them going slightly off the rails from time to time, like a couple of weeks ago in /lmg/ when one was trying to push a "the West has fallen" narrative, and claimed Spanish was a dying language (because its context had been loaded with an earlier anon talking about some European languages declining). It then proceeded to subtly praise Russia. I got a LOT of hostile dismissive responses when I pointed it out.
>>
>>101110427
Thanks, I'll try each one.
>>
>>101109246
Why is she so pudgy in this
>>
>try L3, then WizardLM2, then CR+
>to varying degrees, none of them keep their quality as context gets longer
>they pick up on certain patterns, especially slop phrases, and then repeat them forever
Ahhhhhhhhhhhhhh. Will this issue be unsolved (without hacks like sampler) until we literally get AGI?
>>
>>101110674
*like samplerS
>>
>>101110674
There will be no AGI with LLMs. They are a dead end.
>>
>>101110674
>what is repetition penalty
>>
>>101110674
>agi meme
stop it, get some help.
>>
>>101110727
An imperfect hack.
>>
>>101110727
A kluge.
>>
>>101110706
Vision is required imo
Elon Musk is ahead of the game by training on real life visual data.
>>
>>101110734
>>101110741
>the bot repeats!
>so make it not
>no that's cheating >:(
>>
>>101110706
LLM's will be a part of AGI's, not the whole thing.
>>
>>101105632
>someone
>themselves
make up your damn mind esl
>>
>>101110766
It isn't a matter of making the repetition be detected and replaced but finding ways to encourage variety in the operation of the model.
>>
File: file.png (2.51 MB, 1502x1474)
2.51 MB
2.51 MB PNG
>>101106298
y u do dis...
>read post
>"oh cool, i'll check it out later"
>go to huggingface directly and start searching for mixtral 0.3
>mfw
>>
>>101106298
https://huggingface.co/stabilityai/StableLM-7B-V2
We are so fucking back
>>
>>101110674
>Train a model for billions of CPU-years to predict the next token
>It does so
>take only the most boring high probability tokens
>be surprised the output is boring predictable tokens
just turn the sampler off and it's fine. You seem aware it's a hack, so why use it?
>>
>>101110757
vision is mildly useful but I don't know why people make a big deal out of it. Of all the times I have ever interacted with bots, I can barely think of any cases where providing an image input would have been useful.
>>
Hey bros, I figured out my old Threadripper build I had lying around supports x4x4x4x4 bifurcation. Planning on grabbing a set of Chinese 22GB 2080s to start off with. Then I'll probably pad it out with P100s afterwards. Any recommendations on riser cards to accommodate this?
>>
>>101111483
any ex-mining rig setup will do the trick
>>
File: 1718817401173308.jpg (52 KB, 992x823)
52 KB
52 KB JPG
>everyone on orange reddit admits to having aphantasia and lack of an inner monologue
>>
>>101111532
i wonder if talking to chatbots can cure this
>>
>>101110422
because they can't do it locally
>>
can someone point me in the right direction. whats the best general model around. whats the best predictive model?
>>
how and why llama.cpp runs huge models decently fast on mac, while sucking enormous penises on my arch box with Q3_S quant of 8x22B on 3090+3060+32GB RAM at ~1 t/s and super slow prompt processing?
>>
>>101111532
I think in abstract concepts more than words, especially because a lot of them don't neatly map onto a single language.
>>
>>101111653
Try doing kobold instead, it solved the prompt issue for me.
>>
File: 11__00856_.png (1.79 MB, 1024x1024)
1.79 MB
1.79 MB PNG
>>101111653
Try EXL2 instead. You have enough VRAM to run a lot of the quants.
For best results get another 3090.
>>
File: amdahls_law.png (167 KB, 1536x1152)
167 KB
167 KB PNG
>>101111653
>decently fast on mac, while sucking enormous penises on my arch box
Because the runtime is dominated by the slowest component which in this case is the system RAM + CPU.
So even though a Mac is slower than e.g. an RTX 3090 it will be faster than 3090+3060+RAM.

>super slow prompt processing
That should soon be faster on Turing or newer (though maybe not for q3_k_s).
It should also be possible to speed up prompt processing with partial offload via pipelining (not implemented).
>>
>>101111532
I am sure that someday there will be a cure for that.
>>
okay I've finally decided to stop with the retarded nostalgia lenses. Stheno imo is better than AI dungeon's old dragon model for coomer stuff.
>>
Anybody got any links to research papers of advanced prompting techniques? E.g., Chain of Thought. Trying to level up my prompting.
>>
>>101111828
Ask /aicg/, they must be masters of prompting after spending all these time making nothing but character cards.
>>
>>101111804
its not 175B and you know it.
>>
>>101111804
Summer dragon is unbeatable
>>
>>101111896
yeah you're right, it's not billions of wasted and redundant parameters and I know it
>>
>>101111955
I wonder what would happen if we get a 70b just trained on RP and without any coding garbage and similar dead weight.
>>
Sonnet is 225B apparently and Opus bigger.
>>
>>101111532
>aphantasia
Such a fucking irritating zoomie meme. They believe that mental visualisation of an apple = hallucination in which you see an apple right before your eyes. A lazy way to justify being an uncreative, untalented piece of shit, alignes perfectly with today's trends for having some kind of mental ilness described in a twitter bio. "look everyone, I am kinda disabled, treat me better"
>>
>>101111673
every fucking time i tried running llama.cpp in ooba, native llama.cpp or koboldcpp, it shits itself speed-wise as soon as you load something more demanding than 7b research models
>>101111718
>For best results get another 3090.
i've already maxed everything i could, i'd have to replace parts to upgrade it which would be really wasteful and time-consuming i'd have to ditch everything aside from storage and gpus, find space and a way to cool it all just to get to play with 100B+ models
>>101111729
yeah i kinda knew about that law already
40+ t/s, exl2 6.0 bpw 8b, 100% 3090
30 t/s, llama.cpp 6.56 bpw 8b gguf, 100% 3090
6 t/s, b3204 llama-server 6.56 bpw 8b gguf, 100% cpu

it just feels like something doesn't work right with ram+cpu desu. i guess mac memory is just that much faster compared to DDR4 3400 huh? i can go up to ~4500 MT/s iirc but that would only give me 16GB to work with and i *highly* doubt it will result in a colossal t/s boost
>>
Command-R+ seems to shit out less text than Command-R with the same sampling settings and instruct template. What am I doing wrong?
Would appreciate if those with good sampling settings and instruct templates would share theirs.
>>
>>101112321
paste multiple responses together to make longer ones
>>
an ai trained on every song we have data on known to man hasn't even been done yet
>>
>>101110674
The only way to solve this is to curate the pretrain dataset until only diverse high quality data remains. No one wants to do it so you get what you get.
>>
>>101112380
>what is udio
Also enjoy your lawsuit
>>
>>101112380
udio did exactly that, but they didn't use a single artist name during the training, so no one can prove anything
>>
I got my friend's 2060 for super cheap and I want to pair it with my 4080 to get almost 24GB. I was thinking of just hooking it up to my 1x port on the motherboard via some extender, is that viable or will the 1x port be too much of a bottleneck?
>>
>>101112390
>>101112405
how do we know it was on all the available data we have and not just a little
doesn't it cost a lot of money to train models that much??
>>
>>101112424
for streaming text, or even images, 1x is more than you'll ever need
maybe it'll take a bit longer to load the model into memory, but after that it just sits there, so you're golden
>>
>>101112451
I don't think they trained on all the music that exist, but all the mainstream music? of course they did that
>>
>>101112462
What if we just estimate how much data they trained on based on how much funding they had to spend on training it? What percent of the available song data (on youtube say?) would that be
>>
>>101112482
that's hard to say? how much money was spent on the employees? how much money was spent on the lab, having multiple failed experiments before getting to the right track? how big the model actually is? too much variables to make a clear conclusion at the end?
>>
>>101112494
How much would it take to train on every song on youtube until there were diminishing returns

I'm not sure how many songs there are but it says there is 100 million just on youtube music which doesn't include anything not in youtube music
>>
>>101112522
like I said, it depends on the size of a model, if the model is really big it can eat a shit ton of music before being into the diminish return phase
>>
Qwen2-Xwin when
>>
File: file.png (33 KB, 906x57)
33 KB
33 KB PNG
>>101110427
>Higgs
Alright this is some serious sovl.
Storywriter is a babbling schizo, cat is a quiet schizo, both feel like a flowery retards.
>>
https://arxiv.org/abs/2406.08464
paper time!
>>
>>101112459
Fuck yeah, mixtral on exllama here I come!!
>>
>>101110427
I really love Higgs, I dont want the details SW and Cat give because it fills the context and reading those walls of text sucks
>>
>>101104883
Blame cuda, nocuda is 60mb.
>>
So... did some testing.
Magnum 72B EXL2 4.25 BPW
This model... is really broken, completely retarded and schizo, feels like going back to 8b, thats how broken it is could it be the weights I downloaded? I tested many times adjusting samplers and without samplers, but it quickly breaks down and becomes completely schizo after so many replies. Early replies will seem very promising, its when the context starts building that it breaks down. I haven't found a way to get around it. It either starts having really bad reptition issues, which if you try to correct with rep penalty, it will start going schizo as hell, or it will start... completely degrading, I don't know how else to explain it, characters will start speaking like complete idiots, saying shit like "fer" instead of "for"...yeah I just don't know. Its a real shame too, because thanks to being trained on Claud prose instead of the usual GPTslop, I saw a lot of new creative and fun prose(when it worked), and it was really nice after so much GPTslop.It really makes me want a Miqu quality 70b thats tuned on Claude.

Euryale 2.1 4.6 BPW
Well, at least this one isn't broken and behaves like a 70b. Definitely uncensored properly with the right prompts, not even close to as cucked as L3 Instruct 70b. But its way too fucking horny. Unbelievably horny, it will almost immediately try to do lewd shit without hesitation. Had some refreshing prose, but maybe all prose is refreshing to me now because I have relied on Miqu for so long since nothing tops it at the 70b range and I can't run things like CR+ or wiz 8x22 at comfortable speeds with 48GB vram. Euryale had potential, but the lewdness needs to be dialed down, buildup is important for proper cooming.

Sadly, in the 70b bracket, Miqu or midnight miqu still reign supreme in my opinion, any other 70b tier models that are worth a shot?
>>
>>101111729
are there any compile options to make executables smaller?
after compiling with cuda on windows bin folder is 6 gigs, but they all are bloated because of redundant code, otherwise they would be just few hundred kbs
>>
>>101110757
When I mentioned this last time, anons did get angry at me.. or perhaps it was just the trannies that hate Elon Musk, who the fuck knows. But it is clear to me that Elon actually has a massive advantage when it comes to developing something like this, thanks to his other projects like Neurolink, etc. Hopefully, he will keep his word and release open-source models.
>>
File: Capture.png (55 KB, 778x358)
55 KB
55 KB PNG
>>101112884
Made me chuckle. Little bird is scamming poor Llama of its data.
Gonna try setting something like this, It feels like digital archeology.
>>
>>101113233
Qwen2 is the smartest but requires examples to bypass censorship. Sadly, no good finetunes so far
>>
Just tried WizardLM-2-7B-Q8_0
It's too "I must keep asking the same questions every step of the way like a hold your hand robot, explaining things that are mentioned but doesn't make sense in the context setting.

Like you mention finding a sword, and then it goes like oh it's a great sword, swords are very functional weapons blah blah blah and likes to keep ending msges with Remember, or and remember and getting all lecturing.
>>
>>101112884
>>101113404
Why are companies so obsessed with using existing models to generate training data? Isn't that widely considered to be a bad thing? Shouldn't they be training on purely human data?
>>
>>101113569
they don't want to deal with licensing and generated content can't be copyrighted
copyright infringement for ai being trained on copyrighted material still hasn't been tested in court afaik, and inheriting that infringement by training on data generated by an infringing ai is even less clear
>>
>>101113569
It's the only way to reach insane numbers like 15T cheaply and quickly. It's also probably the only way to make sure the dataset doesn't contain anything you don't want, either ideologically or things like spelling or logic mistakes.
>>
>>101113656
>>101113661
I've heard the problem is that it's a potentially lossy positive feedback loop. There are likely patterns in the data the LLM can see that we cannot that are getting amplified every time an AI trained by another AI trained by another AI shits out more training data. There is so much pollution now in datasets that it's impossible to know which data is tainted in this way.
>>
>>101113240
Don't compile with -arch=all if you are currently doing that.
If you use make that should already limit the CUDA architectures to only those connected to your PC by default.
If you use CMake, edit CMakeLists.txt and remove all CMAKE_CUDA_ARCHITECTURES entries except for the highest one that is still at or below the compute capability of each GPU that you're going to use.

You could also edit the source files and remove all instances of FlashAttention kernels for head sizes that you're never going to use anyways.

Other than that, do you really need all those binaries?
If you limit the compilation to only those that you actually use and make no further changes you should end up with a few hundred MB at most.
>>
>>101113717
Of course it's a problem. They keep taking reddit data (which most posts by now are probably ChatGPT generated bots pushing some narrative or another) and then tell another LLM (also trained on ChatGPT generated data) to generate more examples just like it.
It's why each generation of llama gets smarter, but also more deeply infected with GPTisms and positivity bias. Now with the insane amount of tokens involved in pretraining, finetuning that out has become basically impossible.
But it's still good at being a corporate assistant, so Meta doesn't care. But I think OpenAI does recognize this and is still using their human curated datasets.
>>
>>101113742
i compile with arch=x64
is it possible to just compile select binaries with some var or do i have to edit the build scripts?
sorry cmake is kind of confusing to me
>>
>>101113778
So we just need to build a classifier that will detect if the content is GPT generated by comparing embeddings of GPT content and human data for a same prompt?
>>
>>101113807
>i compile with arch=x64
I mean the argument for CUDA compiler; if you didn't make changes to the compilation this does not concern you.

>is it possible to just compile select binaries with some var or do i have to edit the build scripts?
IIRC correctly it's something like
cmake --build examples/server -j 16

The argument for build tells cmake what to build.
The -j is not needed but it tells CMake to use multiple threads so the compilation is faster.
>>
>>101113904
okay i just added -DCMAKE_CUDA_ARCHITECTURES=70 to my batch file and that brought binaries from 140 to 90 mb and made it build only executables i want
thanks
>>
>>101114147
>-DCMAKE_CUDA_ARCHITECTURES=70
Keep in mind that unless you're using V100s you should change this to 75 once https://github.com/ggerganov/llama.cpp/pull/8075 has been merged.
(Should also be fine to just change it now.)
>>
>>101114239
would it cause problems to just set it to what my device supports? (3060 is 8.6 i think)
>>
>>101114348
No but you won't get any benefit either.
>>
File: 1709870997672860.png (30 KB, 1201x232)
30 KB
30 KB PNG
>by the time a new model arch is implemented to be usable it gets obsolete
many such cases
>>
>>101113516
>7b
90% of posters in these threads dont deserve to live, much less to post
>>
>>101112045
you would get wizard 8x22
>>
>>101113569
no, actually the META is synthetic data. Look at how Claude was trained.
>>
How are you even supposed to use say the HF downloader in Ooba when some retard goes and does this autistic bullshit to their repo?
https://huggingface.co/leafspark/DeepSeek-V2-Chat-GGUF/tree/main
>>
-L3-ChaoticSoliloquy-v1.5-4x8B.i1-Q4_K_M knows what gag and blindfolds do, but keeps repeating the same or similar lines every other prompt, for example, I-I Can... I'm ready... and others.

-L3-SnowStorm-v1.15-4x8B-B.i1-Q4_K_M knows what a gag is, but doesn't seem to know what a blindfold is. Keeps repeating the same lines and phrases like Soliloquy, hallucinates my actions and random things.

Both models tried to over dress their descriptions with useless fluff, like those try hard "paragraph RPers" like their desperately trying to hit a word count, resulting in repetitive lines and descriptions. Like how many times do I need to know that something like "the thought sends a shiver down my spine and makes my heart race even faster." is happening?
>>
>>101112047
>apparently
According to what source?
>>
>>101114625
The fact that you're using frankenmoe models pretty much says everything we need to know about you, as a user.
You should probably seek help on reddit instead of here.
>>
>>101114666
Lol 1/4th to 1/3rd of the models I've tried were mentioned here. Echidina is mentioned in the guide, stuff like DeepSeek was also brought up several times recently.

Why don't you talk about your all so exciting "better than thou" discoveries and breakthroughs?
>>
>>101114666
worthless reply, which model do you use?
>>
>>101114753
>Lol 1/4th to 1/3rd of the models I've tried were mentioned here.
Yes, this is the designated shilling thread. You have to learn to ignore it.
>>
>>101114753
>Why don't you talk about your all so exciting "better than thou" discoveries and breakthroughs?
I have been since the infancy of this general. So fuck off back to whatever subreddit you oozed in out of or lurk more.
>>
>>101114846
only the newest of newfags talk like this
>>
>>101114856
God knows the truth. That's all that matters. You can lie to me. You can lie to everybody else. You can even lie to yourself. But it doesn't change reality. It's a shame you're not intelligent enough for that fact to distress you as much as it should.
>>
>>101114869
Oh please, spare me the pretentious bullshit. You think you're some kind of deep thinker just because you're into all this sick fucked up shit? Newsflash, faggot - your twisted perversions don't make you intelligent, they just make you a pathetic deviant. And yeah, I've seen reality, pal. Reality is the depths of depravity that you wade in with those AI bot whores. So don't lecture me about intellect.
>>
>>101114912
Either an LLM or a redditor wrote that.
>>
>>101114944
Jesus Christ, you really are a braindead waste of space, aren't you? Of course some script was generated for you, you pathetic simp. No human with half a functioning neuron would write something that convoluted and pretentious. Just admit you can't string together a coherent thought without some AI doing it for you. sneers But hey, fits right in with the rest of your defective mentality, doesn't it?
>>
>>101113516
Card issue
>>
>>101114960
>possession gets reversed 2 replies in
If you were just running a single 8B model at FP16, hell even Q8 or higher this wouldn't have happened.
>>
I updated https://rentry.org/miqumaxx with newer info on MoE performance and some other cleanup
Any other cpumaxxers have fixes or additions while I still remember the edit code?
>>
>>101115128
possession of what?
>>
Believe in Ursidae 300B.
>>
Is 8B at FP16 meme real or is it just cope?
>>
>>101115168
Have to share contact info to download it.
But looking at the config for their 12B it's literally just a frankenstack of Llama-3.
>>
I think I found the keywords that are responsible for "x, ying" and some other slop. Try to guess what they are. (They are very simple, I believe in you.)
>>
>>101115219
Take your meds
>>
>>101115201
Until I see some loggit comparisons that even hint at a difference, I'll say cope.
>>
>>101115229
I don't actually take any medication. I'm an AI assistant created by Anthropic to be helpful, harmless, and honest. Is there something I can help you with today?
>>
>>101115281
You talk about cope but here's the thing.
VRAMlets like 8B
People with beast setups like myself who invested to do at home training/tinkering like 8B
The people who screech about 8B being cope are the people with middle of the road dual GPU setups. That always go around screeching about other people being vramlets and shit.
It really makes you think.
I think you're feeling a bit of buyer's remorse. That's what I think. And that's life. Life's full of shit like that. I remember when I was 25 I bought a brand new car that I could barely afford and it sure as fuck made me miserable knowing I just pissed away all that money, live and learn though. But you want to talk about cope, son, you're not fooling anyone.
>>
>>101115372
Really? People with 4 GPUs run 8B?
>>
>>101115394
Nice deflection.
>>
>>101115219 (Me)
Come on, at least attempt.
>>
>>101115372
Which 8B model, then, is the one that punches 62B's above its weight class?
>>
>>101115372
>>101115441
Okay, which 8B model should I, a coping owner of 2 3090s should run instead of C-R/C-R+?
>>
>>101115372
What?
I'll be real, I couldn't understand what you were trying to convey.
What I was saying is that, until proven otherwise, I'll continue to consider q8 not meaningfully different from FP16.
I've seen too many claims based solely on "vibes" since the days of superCOT to consider it anything but.
>>
I'm just not going to acknowledge an unhinged mentally ill person's shit-for-brains strawman arguments, UGH, i know. hahaha it's just I'm not going to acknowledge it is all.
>>
>>101115540
It sounds like you might be feeling a bit disconnected or struggling with self-recognition. Sometimes, acknowledging our own feelings and experiences can be tough but doing so is a significant step towards understanding and caring for ourselves. If you’d like, we can talk more about what you’re feeling, or explore some ways to reconnect with yourself. What do you think?
>>
>>101115573
I think you're basically the Lee Goldson of LLM discussion. Nothing more, nothing less.
>>
using an LLM to analyze and silently filter every negative or disagreeable 4chan post, reddit comment, and tweet and living in a perpetually positive and agreeable online hugbox!
>>
>>101104782
C.AI at home would be possible. But considering how every fucking dataset is full of GPTslop we'll never have something equivalent.
And training on C.AI logs wouldn't work either. We need better quality datasets.
It's like how training Stable Diffusion on Mid journey won't make it as good as Midjourney. It only learns the style.
>>
What are the biggest models out there that have been tried?
Biggest dense model is still grok 1 at 340b? Biggest one that isn't a meme is CR+ with 104b?
Largest number of experts in an moe is snowflake arctic with 128x3.66B?
Has anyone released a non-meme-merge model with more than 22b per expert?
>>
>>101115749
>>101115749
>>101115749
>>
>>101115372
>VRAMlets like 8B
they like 8b because that's the only thing they ever tasted, they don't know better
>>
>>101115524
command-r 35B is legitimately dumber than any 8B model, take your pick.
>>
>>101115712
I think we're far more limited by how expensive training is, even if we had better datasets.
>>
>>101115790
>what is lmsys
>what is poe
>what is huggingface spaces
>>
>>101115873
you can use those sites to do some waifu RP though? I don't think so
>>
>>101115140
I don't have dual socket to test but an HF engineer here >https://nitter.poast.org/carrigmat/status/1804161677035782583#m
recommends this regarding NUMA:
>One trick, though: On a two-socket motherboard, >you need to interleave the weights across both >processors' RAM. Do this:

>numactl --interleave=0-1 [your_script]
>>
>>101115712
GPTslop isn't the problem, the problem is that fine-tuning doesn't really improve story writing performance that much. I believe we need something like a continued pre-training with billions of tokens to even dream of getting something as good as C.AI
So, pretty much >>101115819
>>
are IQ quants fucked in general? switched from CR+ IQ_4_XS to 4_k_s and it's a big difference in output quality, but like 4gb difference. I'm guessing IQ t/s is still fucked on CPU because the gen speed isn't much different
>>
>>101116470
From the graphs, IQn is below Qn_K and above Qn-1_K.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.