[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: miku_flag.png (1.37 MB, 1024x1024)
1.37 MB
1.37 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101749053 & >>101739747

►News
>(08/05) vLLM GGUF loading support merged: https://github.com/vllm-project/vllm/pull/5191
>(07/31) Gemma 2 2B, ShieldGemma, and Gemma Scope: https://developers.googleblog.com/en/smaller-safer-more-transparent-advancing-responsible-ai-with-gemma
>(07/27) Llama 3.1 rope scaling merged: https://github.com/ggerganov/llama.cpp/pull/8676
>(07/26) Cyberagent releases Japanese fine-tune model: https://hf.co/cyberagent/Llama-3.1-70B-Japanese-Instruct-2407
>(07/25) BAAI & TeleAI release 1T parameter model: https://hf.co/CofeAI/Tele-FLM-1T

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>101749053

--vLLM GGUF loading issues with Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf: >>101752460 >>101752493 >>101752587 >>101752887 >>101752942 >>101754105 >>101754463
--Using LLMs for code obfuscation and randomness: >>101750235 >>101750681 >>101751579 >>101751645 >>101751742 >>101752513 >>101753593 >>101753898 >>101754413 >>101754550 >>101754571
--Training a multimodal LLM for reverse stable diffusion captioning: >>101751771 >>101751869 >>101751899 >>101751940 >>101752015 >>101751930
--InternLM 2.5 20B model impresses with benchmark results: >>101750268 >>101750302 >>101751043 >>101751291 >>101753224 >>101753296 >>101750400
--Florence and Kosmos for multimodal image description: >>101750080 >>101750118 >>101750228 >>101750173 >>101751908 >>101751942 >>101751950
--Anon discusses prompt execution time and computer specs: >>101754319 >>101754426 >>101754477 >>101754507 >>101754698 >>101754725 >>101755844 >>101755911 >>101756005 >>101756120 >>101755903
--Using video games to test AGI morality is flawed due to inconsequential decisions: >>101754707 >>101754720 >>101755315 >>101755394 >>101755515
--Updated Mistral preset for Large and other models: >>101751804 >>101753610 >>101753679
--Hardware suggestions for power-efficient Gemma 2 27b inference server: >>101752368 >>101752925 >>101752951 >>101753060
--DeepSeek implements caching on API, others to follow: >>101757114 >>101757169 >>101757232 >>101757482
--fp8 vs fp16 and schnell quality issues: >>101749508 >>101749522 >>101749599 >>101749963
--LORA hotswap endpoint merged, mixture of LORAs idea status unknown: >>101753955 >>101753979
--Anon wants speculative decoding for server mode for coding tasks: >>101752977 >>101753322 >>101753548 >>101753584 >>101753613
--Miku (free space): >>101749957 >>101751995 >>101752374 >>101752420 >>101752435 >>101752960 >>101752990 >>101753050 >>101753073 >>101755688 >>101756008

►Recent Highlight Posts from the Previous Thread: >>101749058
>>
Dead general.
>>
>>101757635
We need to revitalize it with more mikus!
>>
how's the new magnum 32b?
>>
>>101757635
I miss the log posters and the Miku Movers who made this thread really comfy, now its just full of /pol/ schizos.
>>
>>101757635
Dead internet.
>>
https://x.com/HalimAlrasihi/status/1820918388002009363
>>
https://blackforestlabs.ai/wp-content/uploads/2024/08/tv_no_screen_2.mp4
>>
>>101757729
>>101757743
Do you have vanilla? I'm not into your fetishes.
>>
>>101757729
>not even black just mannequin colored
If you're going to live out your cuck fetishes as least do a good job.
>>
>>101757729
Holy mental illness. Get help
>>
>>101757772
You obviously have a racemixing fetish, you told us so even.
>>
>>101757742
Looks like if you replace the 2 with a 3, you get a higher bitrate video.
>>
>>101757742
Do I even want to know what the VRAM requirements for this is going to be?
>>
>>101757791
Miku mindbroken him LOL
>>
>>101757815
Its not really cope when you told us.
>>
>>101757804
1 and 3 are hevc, don't play in my browser.
>>
>>101757742
huh, they actually have info on their site
>>
>>101757840
Projection because you clearly are to the point where you have gigs of this stuff saved on your computer.
>>
can someone host local models for us to test instead of gatekeeping?
>>
>>101757772
based
>>
>>101757849
https://blackforestlabs.ai/up-next/
Not that much info.
>>
>>101757890
kek, didn't read. Have fun seething.
>>
>>101757875
Use colab/kaggle poorfag
>>
>>101757908
>Obsessed for months
Nah you're mentally ill schizo. Glad I'm not having a meltdown on an anime waifu
>>
>>101757901
Speaking of German AI. Has AlephAlpha done anything noteworthy in the past few years? Last i heard from them was their shitty proprietary 70b that they advertised as the definitive BLOOM/OPT-175B killer.
>>
>>101757908
>gets filtered
Brainrot is so easy to spot these days, it was probably all cope anyway.
>>
>>101757926
you shut the fuck up retard how can you expect people to learn locals if you fucking say "LOL POORFAG"
>>
>>101757901
i just meant their announcement post
>>
>>101757944
I filter brainrot words as well, zoomers are so fucked up that normal people look mentally ill to them.
>>
I'm quitting
>>
>>101757948
There is no use in seething about being poor. I'm poor myself and have to run the models on CPU and RAM.
Just get more RAM.
>>
>>101757742
How can a company appear out of nowhere and absolutely destroy SD? I think the chinks are behind this.
>>
File: 1638735741761.jpg (28 KB, 510x510)
28 KB
28 KB JPG
>>101757948
I've given you free options zoomer, be grateful and fuck off. You're not worth anyone time
>>
>>101757908
You need new material
>>
>>101757601
so how well cpu offloading works in vllm?
>>
>>101757948
why do you think anyone here cares if you learn anything?
>>
QUIT BEING POOR IS WHAT YOU SAY BUT I LIVE IN A FUCKING COUNTY WHERE I CANT JUST GET SHIT FOR CHEAP. NO WONDER BARELY ANYONE GETS INTO LOCALS BECAUSE YOU SHOULD FUCKING HELP
>>
>>101758083
I'm saying that I'm poor too.
I don't even have a video card...
>>
>>101758083
Dude, first of all relax retard. Secondly, what you can run is going to be heavily tied to what hardware you can buy/afford. So start saving. If you can't wait buy a runpod, if you can still manage to put money aside for food, housing, fuel, and the little fund to get better computing parts. Say goodbye to other time and money sink hobbies as well while you are at it.
>>
>>101758173
>If you can't wait buy a runpod
Doesn't that require ID + photo KYC bullshit?
>>
File: ComfyUI_00069_.jpg (298 KB, 1024x1024)
298 KB
298 KB JPG
>>101757672
>>
>>101747334
Need more info/updates on this
>>
>>101758190
No idea, if it does then fuck that and your new hobby is now saving money, kek.
>>
File: 1721245034982228.png (195 KB, 407x353)
195 KB
195 KB PNG
>Try 405b
>It's really good
It's not fucking fair, bwos...
>>
File: 1716578696899155.png (3.4 MB, 1451x1907)
3.4 MB
3.4 MB PNG
Is this actually better than Stheno 3.2? https://huggingface.co/Sao10K/L3-8B-Niitama-v1
>>
>>101758409
I use 405B and Mistral Large on openrouter and I find myself preferring to use Mistral most of the time
>>
>>101758442
I didn't test it much but seemed like a wash aside from it being a little less horny by default.
>>
alright bros what cool new models we got
>>
>>101758512
I'm not your bro, pal.
>>
>>101758548
I'm not your pal, buddy.
>>
>>101758512
>>101758548
>>101758559
I am your bro, pal and buddy, fellas.
>>
>>101758566
then why didnt you answer my question :(
>>
>>101758575
My silence is saving your soul, please understand
>>
>>101758512
it's been months and there still hasn't been a release any better than commandr 35b desu
>>
>>101758581
But anon I need my celeb sex toy ai
>>
>>101758583
What is command r good at?
I just downloaded a while ago.
>>
>>101758594
Starling 7B Beta.
>>
>>101758385
Probably nothing, the guys posts don't seem that intelligent so it's likely someone just padding his paper count with bullshit. I'm starting to believe that guy that said LLMs will be smarter than 'AI researchers' soon.
>>
the fuck is a brainstorm model?
>>
>>101758629
But anon thats old
>>
>>101758583
Gemma 2 27B is pretty damn close. Hell of a lot smarter than CR while still having a nice writing style. Too many spine shivers and voices barely above a whisper though.
>>
>>101758472
>Use mistral-large on openrouter
>Begins repeating itself within the first message
I dunno, man. It's very technically proficient, but I just can't get it to stop repeating.
>>
File: FwNjj8ZWwAMpdnl.jpg (83 KB, 896x898)
83 KB
83 KB JPG
>>101758512
>>101758583
>>101758690

See anon, that's how you get answers on this board. Don't ask to be spoonfed, just post an opinion that will bait responses you want.
>>
>>101758409
sterile as fuck. It's good if you're a regular user, but shit if you want creative writing/roleplaying. Even then, largestral is better at coding, so it's lacking for productive cases as well.
>>
File: linus_fanbase.jpg (79 KB, 812x621)
79 KB
79 KB JPG
>>101758751
>Don't ask questions, just consume LLMs and then get excited for next LLMs
>>
>>101758751
This but unironically
>>
>>101758750
I posted about this last night. Same exact setup and same exact problem. I hypothesized it was related to Context Size as I couldn't modify it via the OpenRouter API on Kobold but idk what front end you're using.
>>
>>101758846
I'm using Sillytavern, let me mess around with context size.
>>
>>101758785
L3 405B needs higher temperatures to get sovl out of it. If you aren't running at least 1.0 you're wasting your compute.
>>
File: Sad.png (510 KB, 1014x819)
510 KB
510 KB PNG
>tfw people are talking about using 405b
>me, stuck at 70b
>>
>>101759055
Anon, those of us talking about 405B aren't running it on our own computers
>>
>>101759055
And I'm stuck at 30b models running on RAM
>>
>>101759055
405B can be used for free by trialscumming together.ai
>>
>>101759055
>>101759073
12B checking in (T-T)b
>>
>>101757635
with flux and the latest generation of LLMs people are spending less time posting and more time actually gatcha rolling AI content
>>
>>101759099
Don't they have the worst prices, low context, insanely quanted models, and a shit free trial now though?
>>
>>101755281
>>101755355
>>101755454
>>101749539
I can finally coom to a 2B model
>>
>>101759133
>free
>prices
can you rephrase your question in a less schizo way that doesn't make it seem like you don't understand what "free" means?
>>
>>101759211
I mean sure, if you want to create a new burner every time you want to get three generations of quantslop rather than spending $3 for better options on OpenRouter, that's your perogative
>>
>>101759275
I do use openrouter personally, but I'm assuming the frogposter was some kind of poor if he was upset about not having access to 405B
last time I trialscummed Together they gave $25 credit on signup
>>
File: 1722114549343438.png (342 KB, 856x665)
342 KB
342 KB PNG
ZLUDA has been taken down.
https://github.com/vosen/ZLUDA
Why is AMD so retarded?
>>
>>101759347
The CEO of AMD and Nvidia are related.
That should be all the information you need to piece together what the scam is.
>>
>>101759133
>worst prices
They don't have the worst prices.
>low context
It seems.
>insanely quanted models
They have INT4, FP8, and FP16. The livebench benchmark was done with FP8 and 405B is the 3rd best model.
>and a shit free trial
I think it's $5, but I still have my $25 from a few months ago.
>>
>>101759434
They do though. Input price dominates, and it's 4.50 on Together, which is highest out of all providers, on top of a whooping 4k context
There's truly no reason to use them if you can avoid it
>>
File: file.png (74 KB, 1827x1031)
74 KB
74 KB PNG
>>101759474
>which is highest out of all providers
False. They also have best throughput among the ones in OpenRouter.
>>
>>101759434
One thing I forgot to mention, the FP16 models are reference models. Those will double in price after August 31 to encourage the quants.
>>
>>101757682
It's good, but they need to stop being fags and make either the v2 magnum 72b, or do l3. 1 70b magnum
>>
>>101759515
On OpenRouter, you tard.
Also, there's your 2 token/second throughput.
The service is shit. Get money and use a better provider.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.