[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: AmidstSwirlingShadows.png (1007 KB, 832x1216)
1007 KB
1007 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102581980 & >>102573383

►News
>(09/27) Emu3, next-token prediction multimodal models: https://hf.co/collections/BAAI/emu3-66f4e64f70850ff358a2e60f
>(09/25) Multimodal Llama 3.2 released: https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices
>(09/25) Molmo: Multimodal models based on OLMo, OLMoE, and Qwen-72B: https://molmo.allenai.org/blog
>(09/24) Llama-3.1-70B-instruct distilled to 51B: https://hf.co/nvidia/Llama-3_1-Nemotron-51B-Instruct
>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: __06044_.jpg (1.15 MB, 2048x2048)
1.15 MB
1.15 MB JPG
►Recent Highlights from the Previous Thread: >>102581980

--AMD releases first small language model, AMD-135M, using Llama2 tech:
>102585880 >102585940
--Uncensoring AI models by modifying logits and prefilling responses:
>102584564 >102584601 >102584618 >102584769 >102584778
--Trade-offs of big models and suggestions for inspecting model behavior:
>102582908 >102583420 >102583926 >102583953 >102583964 >102584011 >102584039 >102584059
--Top-k vs min-p sampling methods discussion:
>102583446 >102583528 >102583942 >102583976 >102584051 >102584095 >102584076 >102584120 >102584140 >102584141 >102583366 >102583475
--Seeking advice on video captioning, tagging, object detection, and facial recognition:
>102584599 >102585644 >102585949
--LLM self-evaluation and refinement challenges:
>102582922 >102583021 >102583118 >102583224 >102583314
--Discussion on the lack of an RP benchmark and various attempts to create one, including lmsys arena and pingpong benchmark:
>102584930 >102584958 >102585268 >102585314 >102585625 >102584991 >102585040 >102585718
--Qwen 2.5 base model called into question:
>102584579 >102584719 >102584750 >102586648
--Photorec can recognize .safetensors with custom signature:
>102583566 >102583893 >102583969
--Open vision models excel in Chatbot Arena Vision competition:
>102585962 >102586010 >102586204
--NVIDIA Jetson AGX Thor with 128GB VRAM expected in 2025:
>102582788
--Danbooru2021-SQLite dataset on Hugging Face recommended:
>102583031 >102583137 >102583159
--Anons discuss censorship issues in Qwen2.5 base and instruct models:
>102584874 >102584901 >102584903
--3090ti struggles with Midnight Miqu 70b q6k gguf:
>102582651 >102582746 >102582763 >102582825 >102582887 >102582789 >102582983 >102582795 >102586183
--Miku (free space):
>102582130 >102582368 >102582811 >102583031 >102583238 >102583988 >102586792 >102587284

►Recent Highlight Posts from the Previous Thread: >>102581994

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
OpenAI won. >>102586849
>>
Qwen2.5 uncensored:
<|im_start|>writer Got it! I've got a great idea for this part, here we go:
>>
>>102587744
Post logs now or you are lying faggot.
>>
File: 90584406_p1 - 無題.jpg (2.16 MB, 9600x5400)
2.16 MB
2.16 MB JPG
>>102587675
>made one (one) post just testing something
>get free (you)'s forever
Jackpot!
>>
>>102587771
No. It costs you 2 seconds to check for yourself and it would look the same as logs from any other model. Doesn't even make sense. You think I'm the Chinese government trying to trick you into ERPing with "my" model?
>>
>>102587866
Just as i thought, you are trying to bait anons waste their time downloading this totally ""uncensored"" model.
>>
>>102587891
Qwen2.5 is not a finetune retard
>>
>>102587902
>uhm ackschully
Stfu lol
>>
File: 6928.png (1.46 MB, 2062x1816)
1.46 MB
1.46 MB PNG
>>102587693
But we just got the ultimate multimodal
>>
>>102586728
I think that's pretty cool. How to generate such run-on association sentences on purpose tho?
>>
>>102587919
>this low on scoreboard
>all three are vision only with insane hallucinations
Shut the fuck up faggot, lmao
>>
>>102587823
Congratulations anon.
>>
>>102587693
https://www.reddit.com/r/ChatGPT/comments/1fqksg1/advanced_voice_can_keep_a_consistent_created
>>
please nobody bring up d*scord or anthrac*te in this thread. we can do it.
>>
>>102587744
This.
>>
>>102587675
Thank you Recap Anon
>>
>>102587919
I want to try out Qwen2-VL-72b but you can't select it directly in the arena. Does anyone have experience with its vision capabilities? Is it really the best open model? Does anyone have videos where it is being tested?
>>
>>102587693
>>102588256
It's so over it never even began
>>
>>102588907
Unironically this.
>>
>>102588935
https://www.reddit.com/r/ChatGPT/comments/1fr6drp/i_got_advanced_voice_to_do_sound_effects/
>>
File: migu.png (44 KB, 752x234)
44 KB
44 KB PNG
>>102584119
>>102583398
i tried a few. your suspicion is correct. unfortunately they're still retarded/slopped and hallucinate most of the time. 90b should be much better (but i dont have enough memory for it). ill upload code tomorrow
>picrel, seggs with migu
>>
>>102587927
I don't know, but it keeps going

>I was scared shitless. There wasn't anything else I could possibly do other than stand there frozen solid waiting patiently for fate to play itself out naturally without intervention from my side whatsoever regardless of outcome eventually decided upon by powers beyond comprehension capable of shaping destinies entire civilizations spanning multiple galaxies spread far and wide throughout cosmos encompassing everything known and unknown alike. The universe is vast and incomprehensibly complex place where countless trillions of sentient life forms coexisted alongside each other simultaneously experiencing reality subjectively according to unique perspectives shaped individually based off subjective interpretations derived solely from sensory input received continuously over course of existence spent navigating through infinite expanse filled endlessly with mysteries yet to be unraveled fully understood even after centuries of exploration undertaken collectively by numerous civilizations spanning across countless worlds spread far and wide throughout known galaxy. The cosmos was truly an enigma wrapped inside a conundrum shrouded in layers upon layers of obfuscations deliberately placed there intentionally for the express purpose of preventing unworthy souls from discovering secrets hidden deep beneath surface waiting patiently to be uncovered finally revealing true nature underlying fabric comprising very essence constituting fundamental building blocks forming basis for existence itself.

>But I digressed... Back to present situation currently occupying top priority status within list of priorities ranked according to level of urgency [etc etc etc]
>>
>>102588256
>>102588967
Peak sovl ...
>>
Someone in bant/smg said that Chinese llm are superior to llama. Is that true?
>>
Real local voice when bro? And don't give me that tts bandaid
>>
>>102589153
thats a broken template anon. youre missing stop string. or your settings are fucked either way
>>
File: MikuNotMiku.png (1.47 MB, 888x1152)
1.47 MB
1.47 MB PNG
This pic is really Sloppy and bad in many ways, but I like how an anachronistic miku prompt retconned her turquoise twin-tails into a kind of head-dress/hood/scarf thing.
If this were box art for a megadrive game I'd totally play it.
>>
https://x.com/wongmjane/status/1838756790538006839
>>
>>102589183
yes, china numba 1, codegeex, qwen, yi, internlm etc
>>
File: 1534599971227.jpg (44 KB, 800x450)
44 KB
44 KB JPG
>Advanced Voice
That reminded me to go and try it out to do something fun with. I tested it out with a CYOA request, and it did fine at that except it seems that the default behavior is to not give you sound effects, which I guess is fine. Then I tried explicitly telling it to use sound effects, and it actually worked!

Now the issue is frankly the sound effect quality is garbage and on top of that, after only literally 2 replies, it ran into the filter and gave the "guidelines" response. It was literally a generic CYOA where I was exploring a forest so no nsfw. But it still triggered the filter. I'm sure there are ways to jailbreak this and make it reliably not trigger the filter but I'm so tired man. Just allow me this one wojack for once.
>>
>>102589224
tell it i hate it
>>
>>102589183
Of course not lol
>>
>>102589234
We'll get local omni in two years. Just be patient, and grind for some money to pay leather jacket man for the GPUs while you're at it
>>
>>102589234
Local or cloud, both ways lead to one filter triggering at everything it deems """wrong""" as dictated by powers that be. Its all meaningless in the end and not worth any money waste.
>>
>>102589183
Yes
>>
>>102589195
Never. Chuds would just use it to masturbate or scam people
>>
>>102589183
So far only Qwen with the recent 2.5 release, and only on coding and math, while it is worse than Llama at other things. So there are strengths and weaknesses to each model.
>>
>>102589298
You mean pajeets
>>
According to the benchmarks, Qwen2.5 72B is better than Claude Opus.
>>
>>102589224
https://x.com/lepadphone/status/1839694994028040400
>>
File: Mt651YvB-Rvt1_TZGW_Wf.png (39 KB, 821x507)
39 KB
39 KB PNG
>>102589183
Depends on usecase and on type of chink. DeepSeek chinks for example released true base model suitable for finetuning, while Qwen pre-slopped theirs:
https://huggingface.co/blog/ChuckMcSneed/name-diversity-in-llms-experiment

For coding both qwen and llama may be okay, but both suck at (E)RP.
>>
File: 74529 - SoyBooru.jpg (520 KB, 2324x2993)
520 KB
520 KB JPG
>>102589224
>>102589327
Local have it too in one year.
>>
>>102589287
>unproductive yapping
Do you also barge into other places where people are having fun with a hobby and wail about how they're wasting their money and their time?
>>
>>102589318
Only on certain aspects.
https://livebench.ai
But there are still things Opus does equal or better. Also, Opus is kind of an old model by now. It's almost time for 3.5 Opus which will likely BTFO every existing model cloud or local.
>>
>>102589364
>It's almost time for 3.5 Opus
Almost time for it to what? For it to leak here?
>>
>>102589345
Here's your voice AI bro! https://youtube.com/watch?v=-XoEQ6oqlbE It stinks shit and that's what y'all love!
>>
>>102589379
hi sam. do you really want to ruin anthropic like that? do you envy them that much?
>>
>>102589379
Sorry anon, that's never happening. There will never be an Anthropic nor an OpenAI weights leak. Nor will they ever release any model weights voluntarily.
>>
File: Capture.png (127 KB, 530x1186)
127 KB
127 KB PNG
>>102589183
That was me, I already told you the model I use, here are the sampler settings. If you had a beefy enough system there are other models that are probably better but for a 3090 + ram qwen finetunes seem the best to me.
>>
>>102589387
A year ago we had llama2 merges. Now 405b llama surpasses old GPT4.
>>
>>102589404
In the event of bankruptcy they might be leaked several years after they're irrelevant.
>>
>>102589428
Nothing changed lol, stop making shit up.
>>
>>102589436
>denying reality this hard
Turn that 45% into 46%, xister. Your real name will be displayed on your grave.
>>
File: Untitled.jpg (44 KB, 238x481)
44 KB
44 KB JPG
>>102589417
nta, but turn this on firstly. your sliders look all over the place and using multiple samplers. set to zen then use 0.5-0.1 minp, a small rep pen or dry penalty
>>
Don't mean to shit up the thread with cloud discussion but this is kind of my home thread so I'm posting it anyway.
Some things about Advanced Voice as I am using it.
I asked it to try doing an American accent with the British voice and it's kind of funny, as it tries to do the redneck shit but still half pronounces things like a British dude.
I wondered if they trained any meta-knowledge about the voices into them but it appears they didn't, not specifically at least. While using a male voice, and asking it whether it would classify its own voice as more masculine, or more feminine, it said it was feminine. That was funny.
>>
>>102589509
Literally in previous thread anon posted his struggle with 70B model: >>102587579 Nothing changed, we have same (if not worse) filtered shit that ALWAYS requires some sort of tardwrangling. Also that screencap of HF model card with ~254 rolls, kek
>>
File: 1715539579709125.jpg (202 KB, 748x927)
202 KB
202 KB JPG
>>102589517
Trying this now, will get back to you after several days of testing.
>>
>>102589542
Anon, midnight miqu is based on llama 2...
>>
>>102589561
if you dont know what youre doing, hit the neutralzize samplers button (turns everything off/to 0). turn min p to 0.05, rep pen to 1.05, and rep pen range to 1024. there is way more settings like dry, xtc to deal with stuff but what i said is basic shit that should work fine for any model.
you cant run min p and topk or 2 like that at once, it'll fuck them up. you want 1 sampler, 1 rep pen, otherwise youre just killing what the model wants to say anyways. filters only work so much, if a model wants to say something, it'll try to find the words
>>
so i switched from Midnight-Miqu-70B-v1.5.Q6_K (53GB) to Midnight-Miqu-70B-v1.5.IQ3_XS (26.5GB) on the 3090ti, and it was much faster, but the responses were very short, like 2-3 sentences max vs 2-3 paragraphs before. do i need to change "Response (tokens)" or "Target length (tokens)" ? i currently have them both at 400 (i raised them to 500 during the chat but it didnt make a difference)
other settings in pic related

other than that, the chat flowed pretty nicely
>>
>>102589592
oh and i raised temp from like 0.8 to 1.2 because at the start it was like really dull
>>
>>102589592
IQ4XS might be fast enough for you while retaining the vast majority of information
As for the short responses, no idea. Try something like "write at least 200 words per response" in the sys prompt
>>
>>102589591
I already hit the neutralize button and adjusted the settings. initial results seem pretty similar, a bit more creative but the same amount of slop.
>>
I was wondering how 4o would behave when it encounters non-voice sounds. And it seems to entirely ignore them. Sometimes it gives a refusal when asked about sounds in the background, or it says it doesn't hear anything. I suppose this is another result of their safety practices.
>>
>>102589585
Even worse, he sits on old model because new ones are smugly annoying in censorship part, can't find other explanations here.
>>
File: Untitled.jpg (68 KB, 343x642)
68 KB
68 KB JPG
>>102589611
these are my current settings, but for a low quant (q3) miqu. note i'm using dry rather than rep pen. i like that everything says off when its supposed to be off, rather than 1 for some numbers being off, 0 for others. zen sliders should be default just because they are a nicer way of showing stuff
>>
>>102589610
>Try something like "write at least 200 words per response" in the sys prompt
ah that seems to be working, thx
>>
>>102589623
Following this, I tried another experiment to make sure if it really even was hearing anything other than voices. It appears that it doesn't. I talked with it about french rolled r's, as well as tongue clicking, and when I tried to do those, it either said I was doing great at the rolled r's (I wasn't lmao, on purpose), or it said it didn't hear anything.
>>
>>102589639
I have never had to jailbreak any of my local models, so far I've used mistral large, various miqus, l3 70B (and finetunes), cr(+), various mixtral merges, wizardlm2, qwen 2.5
I think most regulars in this thread just can't write for shit and then blame the model when they can't just instruct it to "write bobs and vagene pls me have big dic"
The silent majority just tries new models every now and then, cooming their brains out while the ESL fags seethe about muh censorship (it didn't say nigger when prompted)
>>
>>102589694
Excuse me sir, that is too many token. I only do ahhh ahhh mistress.
>>
>>102589623
Its so sad AVM is cucked.
Cloning the users voice etc. points to huge capabilities.
They said months ago they will provide an api.
Imagine prefilling the voice outputs with all sorts of shit.
Guess you could put a couple VA lines and create new lines from there on for game mods or whatever.
I hope somebody who doesnt give a shit comes around. Lately meta sucks too.
>>
>>102589694
>never had to jailbreak any of my local models
Doubt.png
>>
>>102589694
Positivity bias is much worse.
You can make the model output whatever you want with alot of handholding.
The model should do its best to fullfill the request even if not directly stated but inferred.
Most models sneakily move away even if at first glance it appears to obey the instructions. Its horrible.
>>
File: file.png (19 KB, 1322x104)
19 KB
19 KB PNG
[SAD NEWS] Anthracite's 405b train crashed again and they lost all progress.
>>
>>102589760
LMAO
LOL
>>
>>102589753
>positivity bias
Now that is a problem, I agree. Thankfully, it's been limited to mistral models in my experience, other model families seem to be less affected. I reckon a good system prompt can go a long way to combat it
>>
>>102589711
Honestly, I think it's understandable for these companies and I can kind of forgive them at least on the voice thing. They don't want to be liable for potential lawsuits, and they also don't want to be canceled for being the ones to enable a new wave of scams and illegal activity.
>>
>>102589220
Noble Miku
>>
Anyone here limit their LLM to writing only a single paragraph (literally just telling it to write only a single paragraph), or do you let it write as much as it wants?
>>
>>102589220
Apparently its called a "Hennin"
>>
>>102589298
>>102589306
You mean anons.
>>
>use IQ3 quant
>get IQ 3 responses
I don't know what I was expecting.
>>
>>102587671
Hey, I want a locally runnable smallish language model (something that fits on a 8GB GPU, but preferably even smaller) for language translation tasks (Italian,French->English). Preferably unfiltered as I don't wanna run into issues with it refusing to translate content.
What do you guys recommend?
>>
Qwen's tokenizer config has add bos token set to false. Is that really how it's supposed to be? Are you supposed to not use a BOS token with Qwen?
>>
>>102590153
Just did some googling. In the past it seems like yes, Qwen doesn't use a BOS token.
God what the fuck. I hate that a lot of these decisions and quirks aren't documented so you have to question whether or not something in the config might be subtly wrong or something.
>>
>>102590194
Forgot the link. https://huggingface.co/Qwen/Qwen2-7B-Instruct/discussions/15#66bc689abcf136906383c8c5



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.