[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


Janitor applications are now open. Apply here!


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108956323 & >>108949851

►News
>(05/29) Step 3.7 Flash released: https://hf.co/stepfun-ai/Step-3.7-Flash
>(05/21) Hy-MT2 “fast-thinking” translation models released: https://hf.co/collections/tencent/hy-mt2
>(05/20) Cohere releases Command A+ 218B-A25B: https://cohere.com/blog/command-a-plus
>(05/16) llama + spec: MTP Support #22673 merged: https://github.com/ggml-org/llama.cpp/pull/22673
>(05/08) KSA-4B-base released: https://hf.co/OpenOneRec/KSA-4B-base

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: teto principle.png (1.04 MB, 1024x1024)
1.04 MB PNG
►Recent Highlights from the Previous Thread: >>108956323

--Intel Crescent Island GPU's high VRAM capacity and bandwidth specifications:
>108956813 >108956855 >108956867 >108956870 >108956887 >108956903 >108956945 >108956964 >108956979 >108957315
--Comparing mistral.rs and llama.cpp performance on B200 GPUs:
>108956708 >108956745 >108956760 >108956809 >108957775 >108958023 >108958036 >108958048
--Comparing Nvidia N1X memory bandwidth against AMD Ryzen AI Max:
>108958059 >108958069 >108958082 >108958089 >108959414 >108961327 >108961628
--llama-bench results for Qwen 3.5 and Gemma 4 on M4 Max:
>108960068 >108960632
--Mistral.rs benchmarks showing poor UGFF output quality vs llama.cpp:
>108957878 >108957885 >108958096 >108958129
--Addressing Gemma 4's repetitiveness in roleplay:
>108960336 >108960455 >108960593 >108960708 >108962888 >108962990
--Proprietary status and open-source promises of MiniMax M3:
>108956662 >108956673 >108956733 >108956692 >108960423 >108956722
--Coding agents preferring shell commands over built-in tool actions:
>108957947 >108957967 >108957980 >108957985 >108958007
--Local TTS recommendations for long-form narration and PDF reading:
>108961085 >108961152 >108961188 >108961212 >108961282 >108961744
--Mixed reports on llama.cpp PR for limiting llama_context outputs:
>108957117 >108957200 >108957226 >108957588 >108960370
--Using DuckDB and local datasets for offline information retrieval:
>108962182 >108962270 >108963394
--OS power plans and GPU clock locking for faster offloading:
>108958954 >108959002 >108959506
--ROCm support and stability issues with v620 GPUs:
>108956495 >108956554
--Comparing Go and Python memory usage for TTS server startup:
>108962253
--Logs:
>108957062 >108957878 >108958548 >108961425 >108962253 >108962759 >108963127 >108963543
--Miku (free space):
>108956410 >108960487 >108962255 >108962716

►Recent Highlight Posts from the Previous Thread: >>108956325

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
Tetolove
>>
Tetomarriage
>>
File: 1752045141530539.png (23 KB, 1070x156)
23 KB PNG
nth for QoL
>>
File: c2_slop.png (99 KB, 674x668)
99 KB PNG
I ripped out Orb's slop detector an ran it on the c2 logs dataset
Now I need to make deepseek 4 rewrite the flagged sentences until no more slop is detected, then try training some shit on it
>>
File: 23a.jpg (27 KB, 500x375)
27 KB JPG
Can I force more of a moe model onto ram? If I just leave it on auto I can fit Q4 moe qwen despite being able to fit Q6 moe gemma. And I have ram to spare.
>>
>>108964085
just j-j-jam it in
>>
Fuck, Marry, Kill
Miku, Teto, Neru

Go.
>>
>>108964103
Kill, Marry, Fuck
>>
>>108964103
Marry then kill, fuck then kill, kill
>>
>>108964079
Interesting idea. I mean if I cared I would implement something like this too.
> Scan output for sneed words -> generate second pass.
This can be automagic too.
>>
>>108964079
We can anticipate the results: it will be more difficult to train compared to the raw logs, and the model will exhibit different slop, while also degrading in capabilities anywhere else than roleplay in the format of the logs.
>>
>>108963996
What state do they benchmark closed source models, because they fiddle with them and change system prompts almost daily.
>>
>>108964085
yes, with -ngl and -ot args
>>
>>108964079
won't that that just converge to next thing you'll find annoying?
also, how long are these?
>>
Is Vulkan good enough nowadays that I can pick up a second AMD card with lots of VRAM to pair with my 5080?
>>
>>108964143
All I hear is seething from the AMD camp
>>
>>108964143
It works decently enough on my 7900 XTX, but that's just one GPU I haven't tried a multi GPU setup with it.
>>
>>108964128
>>108964140
You're actually absolutely right
That would be no different than to just distill deepseek directly now that I think about it
>>
>>108964143
absolutely sir please to buy! supports is very the good
>>
ComfyOrg is a grift company. We need cumfart alternatives not engrain grifter projects into our chats. Absolutely disgusting
>>
>>108964167
You should be posting in that pewdiepie thread instead.
>>
>>108964167
You lost boy?
>>108964162
I don't understand the point to deepseek now that Qwen is here.
>>
>>108964162
i mean, you could change the slop profile, but not remove it entirely
maybe divide the dataset into x parts and each element have different guidelines for rewriting?
like 25% one style 25% other, etc.
at least it would be different
>>
File: 1766130318643777.jpg (685 KB, 1756x2200)
685 KB JPG
>>108964147
>>108964155
>>108964163
Guess I'll just try finding a cheap (lol) 16GB 4060Ti or 5060Ti
>>
>>108964182
Your speeds are going to go to shit target a used dupe card or just get a unfied memory system to cope
>>
File: gemmy.png (49 KB, 1041x831)
49 KB PNG
even in strokes gemmy is pure sovl
>>
>>108964140
These anti-slop methods will never work properly except for the low-hanging fruit, because the samples will get fixed independently from each other, and that's where new slop will be introduced.

Trying to fixing the problem just by finetuning is not the solution. A big source for the problem is that during inference the various conversations and message swipes are independent from each other, and current samplers do not fix this. LLMs do not have memory of past messages for avoiding frequently used patterns.
>>
>>108964171
why?
>>108964172
OP image

>>108963996
use sdcpp instead of that bloated malware. fuck comfy
>>
>>108964176
>maybe divide the dataset into x parts and each element have different guidelines for rewriting?
Good luck getting it to converge
>>
>>108964197
this is a masterpiece, ex tier writing
>>
>>108964204
Because you are infatuated with youtubers.
>>
>>108964190
I was going off the figures this >>108956026 anon posted
Mid 20s t/s+ with Gemma MTP coming soon sounds good enough
>>
>>108964201
>because the samples will get fixed independently from each other, and that's where new slop will be introduced.
What if I tested it against the whole dataset blob? Not a single expression will be repeated
>>
>>108964228
As long as you're good with it that's the only thing that matters.
>>
File: fuck gemma.png (283 KB, 1378x1996)
283 KB PNG
>>108964197
Gemma copies are personalized
>>
File: 1714067749751.jpg (428 KB, 1825x1152)
428 KB JPG
>>108964103
Marry, Fuck, Kill
>>
>>108964167
>>108964204
You are still a raped retard, Julien
>>
File: 1763881939726576.png (75 KB, 776x202)
75 KB PNG
many such cases
>>
>>108964223
why would I be?
>>
>>108964167
>>108964271
Can you both fuck off back to your containment general? Thanks.
>>
>>108964244
damn, she got you there
>>
File: 1776554137411894.jpg (384 KB, 2120x1124)
384 KB JPG
>TO STATES
>• Implement a prohibition on standalone generative AI systems that have been built using unlawful web scraping, defined as the bulk and mass collection of training data through the World Wide Web, without protection against non-consensual collection of personal data.
>• Enact legislation requiring transparency regarding training data collection practices and accountability across AI supply chains, and further:
>• Require in law that technology companies, including those developing and deploying generative AI systems, carry out ongoing and proactive human rights due diligence to identify and address human rights risks and impacts related to their global operations. This must include clear regulatory frameworks requiring mandatory human rights impact assessments before the deployment of generative AI systems.
>[..]
>• Ensure meaningful consultation by independent bodies with affected communities, particularly those historically marginalized or discriminated against, throughout the lifecycle of the product.
>• Where AI deployments are identified as exacerbating existing inequalities or creating new forms of discrimination, to cease their use.
>• In all development, deployment and use of any AI system, guarantee access to effective remedy for human rights abuses linked to the impacts of technology companies, wherever the harms occur, including harms resulting from the operations of their subsidiaries, whether foreign or domestic. Redress mechanisms should be made easily accessible and understandable to enable individuals to file complaints when their rights have been infringed.
>>
>>108964298
Dose Amnesty International exclusively hire retards?
>>
>>108964278
catjak is here too so you can share his mental illness
>>
>>108964103
marry, fuck, kill
it makes the most sense if you think about it
>>
>>108964278
comfyorg should die for the enshitification of the ui. They killed a good app
>>
>>108964103
fuck fuck fuck fuck fuck fuck fuck fuck
>>
>>108964298
>they'll protest this but not the age verif everywhere absolutely raping any inch of privacy one might have
>>
>>108964298
thankfully this is so retarded on its face that it will be rightfully ignored. a ban on training on bulk data from the web is basically a blanket ban on LLMs kek. and as for the rest
>our idea of effective regulation is, um... you have to fill out a lot of paperwork that no one will read about heckin' systemic injustice!
look at my progressives dawg, we are never having an effective left wing movement ever
>>
>>108964259
Miku's still holding my post :)
>>
File: yawning.gif (143 KB, 220x230)
143 KB GIF
>>108964298
>some jewish ngo has an opinion on something
>>
>>108964298
basically eu ai act lol
>>
minimax m3 is soon(tm) but i feel nothing..
>>
>>108964465
it's gonna be 1t and the arch will never be implemented in llama.cpp
>>
>>108964465
Right after Q3.7 release for sure
>>
>>108964228
That includes a 5070 ti, which is basically a 3090 equivalent with 16gb vram. You're probably not getting that speeds with tensor parallel even with two 5060 ti.
>>
>>108960896
Under qwen 3.6 27b direction it chose more trip hop and R&B, same seed and settings in ace step. Will still dock a point no mention of kitsune in the song
https://vocaroo.com/1fvHCXj0Vp2m
>>
>>108964465
I tried it over openrouter and it's certainly another minimax model.
I don't have a lot more to say about it.
>>
>>108963996
llama.cpp.performance went to shit over the last couple of months, older version I am using concurrently is twice as fast on qwen3moe
>>
>>108964613
any concrete metrics like llama bench and kld or just posting shit feels?
>>
>>108964626
>kld
That's a good point actually.
It could be that it was faster, but also that something was broken and the outputs were degraded.
I think something like that happened back in the 80B A3B days, IIRC.
>>
File: tetoTeamRocket.png (2.15 MB, 1024x1536)
2.15 MB PNG
>>108964298
> prohibition on such systems.
lol. ofc you can fill out a bunch of forms to get an Amnesty Int'l seal of approval.
Fuck these rent seeking mfer's.
Also link so other anons can point and laugh: https://www.amnesty.org/en/documents/pol40/0996/2026/en/
>>108964322
It's an NGO. So yes.
Also this: >>108964406
>>
>>108964517
I'd be pairing the 5060Ti with a 5080 albeithough
>>
>>108964298
Based. Ban all large scale training and deployment until regulations on lawful data use are developed and implemented. Open source all prior existing models trained on unlawfully obtained data. Put the technojews who orchestrated it all behind bars. Models trained on humanity's accumulated cultural output should be free, only models trained on novel data should be allowed to be closed.
>>
>>108964626
Of course 'llama cuda dev' defense force is here in action.
>>
>>108964143
It should be fine for -sm layer which just pipelines the GPUs; you can compile multiple ggml backends at once and then mix and match them at runtime.
For -sm tensor which attempts to run the GPUs in parallel mixing NVIDIA and AMD is a non-starter I think since there is no vendor support for synchronization between them.
>>
>>108964626
Posting my disbelief
> llama-bench
I build it later/tomorrow and post some results, got other stuff running on that machine rn
>>
>>108964748
will there be an option to combine tensor and pipeline parallelism at some point? I'd like to run 3 groups of TP 4 or 6 groups of TP 2 if that's faster.
>>
>>108964794
My ultimate goal is to have support for the combination of tensor and pipeline parallelism but that will require a refactor of the graph allocator.
One usecase will be to pipeline multiple copies of tensor parallelism with itself in order to hide the latencies of transfers between GPUs (unlcear whether that will actually work out).
>>
>run llama/kobold on host
>run vibe slop agent in vm
Is this the way to do it?
>>
>cuda dev
>cuda dev
>cuda dev
when will they hire a rocm dev? arent they being propped up by huggingface?
>>
>>108964919
this but save yourself the system resources and just use a docker container instead of a full-blown vm. hermes-agent has this built-in as a one-click setup.
>>
>>108964939
That would require rocm devs being a thing, sadly jensen and his cousin have conspired to have all of them disappear. Buy more to save more.
>>
>>108964950
Besides restricting file access what is sandboxing supposed to protect you from? If it's doing something malicious then isn't being on the same network already a risk?
>>
>>108964962
You don't have to give it network access to anything other than the inference endpoint
But restricting file access is the main point. If it's doing something nasty from inside the VM, you can just stop the VM. But if it has free rein to fuck with your .bashrc and such, it can persist itself in all sorts of ways that will be hard for you to detect.
>>
>>108964962
nta, it's more limiting the blast radius if your model does something stupid (trying to delete stuff it shouldn't, breaking your config/env, etc) rather than active malice
it's a gate to stop the baby from falling down the stairs not a home security system
>>
>>108958925
Broken for me, shows up as name style. Archive still shows them correctly though.
>>
Weird behavior I get with qwen 3.6 27b mtp, dmesg says Time jumped backwards, rotating. And my podman containers say they exited 292 years ago. Does anyone else have this or is this unrelated to llama.cpp? I usually run gemma (no mtp), and haven't had this happen. Ran qwen (no mtp) for a couple of weeks and didn't have this issue. Ran qwen (mtp), and 170k tokens in, this happens. I reboot, and try again and the server kills itself at 40k tokens. Ran without mtp and it crunched 250k fine.
>>
File: 1768140807266246.png (52 KB, 648x311)
52 KB PNG
>>
>>108965068
n-no, nothing ever happens!
>>
>>108965068
Did they forget to pay their bribes on time?
>>
3.3 70b uncs what are you running
>>
>>108965068
finally, the models are going to be BASED as fuck now!!! MAGA!
>>
>>108965047
you’re gonna have to do more debug research than dmesg saying time jumped backwards. what did your ntp client do?
>>
>>108965128
GLM 4.6 IQ3KS/4.7 IQ2KL ubergarm for co-op writing story stuff, Gemma 4 31b Q8 for RP, Qwen 3.6 27b MTP Q8 for code.
>>
>>108964960
yeah but can’t some poor vibe coder shit put some functional support? lack of driver software should be a solved problem pretty soon. all in on intel and amd!
>>
>>108965128
Gemma 31B F16
>>
>>108965161
how's the speed if you don't mind me asking. Also how do you feel about smaller models making these jumps?
I think it would allow you guys to run some crazy workflows no?
>>
>>108965141
I don't know how to check what my ntp client did, so I went to ask qwen (mtp) and it instantly killed my server lmao.
>>
>>108965171
you’re in over your head if you can’t read your system journal
>>
>>108965177
>you’re in over your head if you can’t read your system journal
Evidently.
I let gemma take look and she says my logs are all clean. Too clean - they just cut off at the time of the crash. Some kind of hardware fault that only rears its head with qwen (mtp)?
I think I'll stick with gemma for now.
>>
>>108965167
2x 3090 128GB ddr4 3200 windows: 4.5-5.3 t/s, 19-25t/s, 40-60t/s. pp for gemma and qwen 1000-1700, forgot for glm.
>Also how do you feel about smaller models making these jumps?
They're just much better at following creative instructions, Gemmy especially, while having the same problems as before like slop and context rot, but the feeling of context rot is now noticeable to me around 8-16k instead of 2-4k for stories and RP.
>I think it would allow you guys to run some crazy workflows no?
Only new thing I'm doing is using Qwen in Cline because it's good enough to do so 90% of the time, meaning it can use Cline not that it doesn't fuck up the code occasionally. Works best if you have some knowledge or come up with a plan and tell it to do specific things. "Make so and so that do exactly this and wire to this" and not pure vibecode "make this feature".
>>
>>108965275
I had 35 tks pp for glm 4.5 on two 3090s and ddr4 3200.
>>
When will powerful local LLMs be accessible to people with consuner hardware?
>>
>>108965068
>Situational Awareness is now 2 years old
>people still haven't read it
Government taking control over AI is inevitable. Leopold's prediction that it will happen in 27/28 seems accurate. I hope you people aren't retarded enough to be surprised when open source AI will become heavily regulated and largely outlawed.
>>
>>108965295
qwen3.6 exits
>>
>>108965302
You need at least Q8 and 200k context to do anything useful with it.
>>
>>108965295
within the decade
>>
>>108965314
Why are you spreading misinfo?
q5 and up are fine
>>
>>108965295
powerful is a moving target and datacenters will always be better than consumer setups
you'll probably be able to run something mostly as powerful as today's best stuff in a year or two, but by then there will be even better stuff in the cloud
>>
>>108965345
*it will be banned
>>
>>108964278
>both
It's actually just petra. She has this tactic of accusing herself with other names to deflect the blame to people she doesn't like.
>>
>>108965384
shut up nerd
>>
>>108965161
>GLM 4.6
Why? Because NovelAI told you to not use 4.7? Kill yourself fucking shill.
>>
kek!
>>
>he's back
>>
>>108965403
do not to whine:!
>>
>>108965403
3 days fly by so fast
>>
>>108965292
Just checked it, getting 120-160t/s with ubatch 4096 for 7500 new tokens on ik_. But of course it depends on new token count, PCIe bandwidth (i'm running 3.0 x8), and `-cuda offload-batch-size=` if low new token count.
>>
>>108965298
They can't unpublish models so they're going to just sabotage the inference engines like llama through normal FOSS social engineering vulnerabilities. Primarily solodevs like KoboldGOD are the only real path forward.
>Vibecode your own
Doesn't build sustainable infrastructure longterm.
>>
>>108965454
>They can't unpublish models
>makes hf illegal to access in you're path
>>
>>108965132
fuck yeah, no more antisemitic models! praise israel!
>>
>>108965465
Torrents still exist.
>>
>>108965465
>hf illegal to access
And herd everyone over to a chinese replacement site?
It's a no-win scenario for them trying to kill open source genAI. All they can do is make it a pain in the ass to get data and stop big corps from releasing.
>>
>>108965403
Do you like FUD? Did you like getting spammed about 4.7 being more censored without anyone ever offering proof? Just because there's a fucking shitty company with a paid subscription stuck with it? Fucking worthless shill. Go try making money somewhere else fucking asshole.
>>
>>108965489
lol good luck getting goog to drop gemma5 via torrent like mistral used to do i guess
>>
>>108965465
>make crime illegal
gee, guess I'll just give up
>>
>>108965512
all that matters is labs giving up, not like toones or anything community led ever did anything for us
>>
>>108965521
yes, every lab across the planet would simultaneously give up
>>
>>108965500
You're out of your mind if you think Goog ideologues won't open source and torrent the weights of everything they can if they think DRUMPF is coming for them.
Despite spending the past half-decade beating their chests about muh safety, censorship, and all that gay shit, they will absolutely release le scary dangerous AI to empower the brave trans folx and peeohsees against voldemort megahitler. Some might argue that's why 31b released as good as it did as testing the waters for an open Gemini flash release.
>>
>>108965521
Remember how all the EU labs got gigafucked by legislation and it didn't matter at all to the rest of the world's labs? Same thing if the US does it. None of the US's best labs even release open source other than Google.
>>
>>108965555
EU doesn't have anything worth releasing doe.
>>
>>108965068
I'd rather have him oversee them than sam or dario desu
I want my models without feminism trained into them
>>
>>108965566
Like the US, they had one lab worth a fuck to open source, in their case: Mistral.
>>
>>108965572
you'll get that, as well as no porn, monkey paw type shit
>>
>something older from before the age of man
SHUT UP GEMMA, YOU CAN'T SENSE AGE WHEN CASTING SPELLS
>>
>>108965572
>>108965582
I just hope everyone of these niggers loses desu
>>
File: 1777840288835931.jpg (60 KB, 552x667)
60 KB JPG
is there any place where i can try different models at different quants to check what is good enough for the jobs i want to do before investing in ewastemaxxing?
i dont mind needing to upload myself the models, but i would prefer to not need to make a virtual machine and install everything
>>
>>108965599
I guess rent a cuckpod (runpod) or something like that?
>>
Why do we need programmers to write code using AI if the future is supposed to involve using agents that are designed to render that very software obsolete and automate it in the background?
Just so they don’t lose their jobs for a little while longer?
If we went straight to using agents, couldnt we save ourselves all the computing power we are currently pouring into software that will be obsolete tomorrow?
>>
>>108965617
You make money directing agents moron, human conductors are needed
>>
>>108965596
read the order. it's a nothingburger
https://www.whitehouse.gov/presidential-actions/2026/06/promoting-advanced-artificial-intelligence-innovation-and-security/
it's just some shit about vuln scanning and codifying a retarded early access mechanism for le uber haxxor models to be used first by cyber security. basically just encouraging them to partner with the nsa to backdoor their shit or whatever
>Nothing in this section shall be construed to authorize the creation of a mandatory governmental licensing, preclearance, or permitting requirement for the development, publication, release, or distribution of new AI models, including frontier models.
friendly reminder that EOs do precisely nothing other than be public facing text for whatever bullshit policy they are already following internally
>>
>>108965646
> We need conductors to orchestrate agents to develop Excel for office bitches so we can make money

Are you attached to your job? Why should the office bitch use your Excel when agents can do her work too? Why do we need you conductors to orchestrate your Excel for her?
>>
>>108965696
You're not a bright one
People like you is why I sleep well at night
>>
>>108965696
>give the wheel to american corpobot
>reports you to authorities for tax evasion
>>
>>108965721
What end-user software do we need that we couldn't replace with an end-user agent interface?
Facebook and Candy Crush?
>>
>>108965737
Infrastructure retard
>>
>>108965748
Why do we need so many of you for that? AI isn't getting any worse - on the contrary, surely most of that can be streamlined away.

And why so aggressive? Isn't that the master plan behind AI?
>>
Back to base(d) Gemma I go.
>>
>>108965572
> begging for government regulation
Pic related
>>
>>108965165
Well ask geohot and you will discover the horror that is AMD's software division and how isolated they are from the rest of the company.
>>
I think gembrain might be alright for a finetroon
most others just feel like a downgrade or schizo
this one feels like an actual sidegrade to base gemma though
>>
Reminder to not fall for Nvidia's propaganda, that new notebook of their is a Mediatek and those suck for local LLMs and anything that actually uses the GPU
>>
>>108965572
>It's fine when I like the boot on my face
>>
https://huggingface.co/google/CircularNet
sirs what is this?
>>
>>108966039
>see poster
>see poster linkedin.com/in/ link on hf profile
>Bengaluru, Karnataka, India
of course
>>
File: file.png (168 KB, 1023x708)
168 KB PNG
>>108966039
hmm
this is just a dataset but an open image model by googl... me thinks could be cool
>>
>>108965962
I mean, if they're going to sell it at a good price (lol) it can be a good product even if the performance is lacking.
>>
>>108966052
>Change the text "NAUGHTY" to "KINDNESS"
based
>>
>>108966039
https://sustainability.google/stories/circular-economy-marketplace/
>CircularNet: How Recykal built Asia’s largest circular economy marketplace using Google AI
>September 2023
>India produces around 62 million metric tonnes of waste a year
>CircularNet, Google’s open-source machine learning model for waste management
>now operating in more than 30 Indian states and union territories
>>
>>108966072
holy lamo
>>
>>108964201
isn't the problem itself intractable to begin with? Once you start hitting real phrases people use, you're just detecting and flagging cultural noise lmao, and there's always going to be that.
>>
>thread culture mentioned, melt incoming
>>
File: file.png (246 KB, 491x451)
246 KB PNG
I tried to make a self supervising language trainer for a non-language model. It didn't work, but this excerpt from the logs cracked me up
>Child: % i=aiiyie c2:s$is&o a eleP\ e in Xrhy l ao te e onrieii notr e aa sh
>Parent (GPT-2): yeah
>Child: i wintlnh,hate oilsho tieeieecRrnihruoe wi otapnmocnmany.peiiu wn2 e.eblntbriw, dwoX.sow
>Parent (GPT-2): I love my son
>>
> The AI Alliance wants to train a frontier base model by sharing weight deltas instead of data, so contributors keep their corpora local
>https://thealliance.ai/blog/project-tapestry-the-path-to-frontier-sovereign-ai
>>
>>108965919
So, you’re telling me intel has a chance?
>>
>>108966284
Same chance as OpenAI releasing all their models for free with MIT license.
>>
What is my opinion on Ed Zitroon?
>>
>>108965607
i guess it should be a good option to try
>>
>>108966322
wrong about almost everything
>>
>>108966111
It is pretty funny anon, thanks for sharing
>>
are gemma-chan's quants still the meta for /d/eranged RP? Her sloppisms are getting a bit grating. I'm considering trying to force thinking blocks to not just to enforce sloppism rules; but to make her consider the complex maneuvering on the card too.
>>
>>108966600
How would you affect her reasoning blocks itself? I tried half-assedly but it didn't make any difference.
>>
>>108966626
She LOVES System so i figured a mix of very strong system prompts, ban the semicolon, ban tokens "not just" as well as strict "Phases" across a linear timeline on the card might get things in line.

before i turned them off reasoning basically made her fixate on whatever 'phase' we were in which was nice for a bit, but I'm not sure how to make her take initiative. Maybe having her track "Variables" in the thinking block and enable feeding the prior block as context.
>>
>>108966626
I've gotten gemma to follow an exact reasoning sequence to the letter by putting it in post history instructions as system.
The only problem was that it sometimes repeated it, which was easily fixed by setting a reasoning token budget.
>>
File: redditqwen.png (67 KB, 757x687)
67 KB PNG
Which of the two models does /lmg/ use?
>>
>>108966677
Qwen 3.6 27B for coding and Gemma 31B for everything else. Moe models are cope
>>
What's the best model for rtx3050?
I need 64k context.
I have no idea and everyone I know is lying to me by saying I should give up

>>108966677
Oh sounds like there are only two. What quant would fit in my setup?
>>
>>108966695
Either learn how to suck dick or give up, you're not running shit of value with that piece of shit.
>>
>>108966689
>Gemma 31B
Not one of your options chud
>>
>>108966689
I have never seen any evidence of Qwen being better at coding. I have seen people posting logs of Qwen being far worse at tool calling.
>>
>>108966695
Qwen's context is so cheap, you might actually be able to fit 64k context with the moe with the experts in RAM.
>>
>>108966704
Name a consumer GPU that can run Gemma 31B at 200k+ context without KV cache being quantized?
>>
>>108966701
I lost my job year, I have to spend my savings on rent, so I run shit with my card.
>>
>>108966718
Figure it out bud
>>
>>108966714
So the argument went from Qwen being better at writing code to it using less VRAM?
>>
>>108966714
NTA, so qwen kv cache is smaller? huh
where do one learn all this stuff
>>
>>108966695
>I have no idea and everyone I know is lying to me by saying I should give up
Don't give up, just accept less than one token per second.
>>
>>108966695
a p100 is like 100 bucks, just saying



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.