[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101312606 & >>101306301

►News
>(07/07) Support for glm3 and glm4 merged into llama.cpp: https://github.com/ggerganov/llama.cpp/pull/8031
>(07/02) Japanese LLaMA-based model pre-trained on 2T tokens: https://hf.co/cyberagent/calm3-22b-chat
>(06/28) Inference support for Gemma 2 merged: https://github.com/ggerganov/llama.cpp/pull/8156
>(06/27) Meta announces LLM Compiler, based on Code Llama, for code optimization and disassembly: https://go.fb.me/tdd3dw
>(06/27) Gemma 2 released: https://hf.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: threadrecap.png (1.48 MB, 1536x1536)
1.48 MB
1.48 MB PNG
►Recent Highlights from the Previous Thread: >>101312606

--Support for glm3 and glm4 merged into llama.cpp: >>101313572
--Speculations on Google's Tokenizer and the "Toxicity" Attribute: >>101312851
--Llama-Multimodal-VQA by AdrianBZG: Anyone tried it?: >>101314836
--Lack of System Prompt Support in Gemma: >>101314379 >>101314390 >>101314399 >>101314600 >>101314608
--InternLM2_5-7B-Chat Model Performance Comparison: >>101313128
--Creating Diverse and Realistic Synthetic AI Training Data: Benefits and Potential Harms: >>101314326 >>101314364 >>101314399 >>101314600 >>101314608 >>101314631
--Caching a Document for Efficient Model Querying: >>101314918 >>101314955
--L3 70b Fine-Tunes and Rope Scaling/Alpha Behavior: >>101318126 >>101318605 >>101318708 >>101318742
--Double Space Problem and Token Toxicity Relation: >>101313221 >>101313398 >>101313427
--Trade-off between GPU layers, inference speed, and context memory: >>101317521 >>101317550 >>101317567
--FP8 Support in vLLM: Is It Worth the Precision Trade-off?: >>101314933 >>101314965
--Comparing Outputs of Dev Branch exl2 with Gemma2 Support and llamacpp: >>101317031 >>101317420 >>101317541
--Why isn't anyone using ollama and Open WebUI?: >>101315740 >>101315767 >>101315828
--The Solution to All AI Problems: More Compute and Iterative Output Refinement: >>101314791 >>101314823
--Running Local AI on a Mobile Device: Server and Tunneling Solution: >>101316312 >>101316351 >>101316367 >>101316414 >>101316369
--Jailbreaking Gemma2 for ERP Purposes: >>101317688 >>101317714 >>101317732 >>101317981 >>101318507 >>101317903
--Gemma 2 Release and the Wait for Flash-Attention: >>101314974 >>101315010 >>101315027 >>101315293 >>101315316 >>101315486
--GLM-4 9B Chat Q8's Furry RP and Llama.cpp Functionality: >>101314960 >>101315391 >>101315570
--Black-screening issue after hibernation and the CUDA upgrade: >>101316047 >>101316208
--Miku (free space): >>101317746

►Recent Highlight Posts from the Previous Thread: >>101312610
>>
File: 1690812946339542.png (241 KB, 1145x633)
241 KB
241 KB PNG
>got LLaMA working locally
>tried ERP chat with custom bot
>went great
>got to the end
>kept hitting enter without a response
>it goes into a long, well written story about a post-scene
but wait, there's more...
>>
File: 1709251075278142.png (92 KB, 1076x511)
92 KB
92 KB PNG
>>101318999
>proceeds to vomit out this nonsense
>followed by an entire fake comment section
>>
File: 1698838200297296.png (163 KB, 1042x477)
163 KB
163 KB PNG
>>101319007
>proceeds to rapidly lose coherence
>decide to intervene because I'm getting bored
>>
File: 1708650918679426.png (32 KB, 508x205)
32 KB
32 KB PNG
>>101319018
>no clue what it's referring to
>>
File: 1705401639962646.png (32 KB, 597x190)
32 KB
32 KB PNG
>>101319028
>>
no one's reading a word of your log spam anon
>>
>>101319059
I am.
>>
i want to play the rape by train game, can you ask it for a dl link?
>>
>>101319059
I'm
>>
>>101319059
If this was an attempt to bait me into reading them, then it worked.
>>
>>101318970
impregnate miku
>>
>>101319203
Breeding sessions with miku...
>>
I just read an anon's log spam to spite another anon saying nobody's reading his log spam and I feel like I wasted my time
>>
Gemma2 still broken?
https://github.com/ggerganov/llama.cpp/pull/8348
>>
File: file.png (81 KB, 1438x366)
81 KB
81 KB PNG
i told you guys...
https://github.com/ggerganov/llama.cpp/pull/8228
>>
https://huggingface.co/EdgerunnersLab/google-gemma-2-27b-it-ortho/tree/main

Orthogonized (uncensored) Gemma 27B
>>
Just discovered lunaris last night. I personally think this is an improvement over Stheno v3.2, considering the other models helped balance out its creativity and at the same time improving its logic.
>>
May I suggest acquiring an advertisement?
>>
Advertisements cost money.
>>
L3-8B-Lunaris-v1 is definitely a significant improvement over L3-8B-Stheno-v3.2 in terms of situational awareness and prose, but Lunar-Stheno is outperforms both in my opinion.
>>
It's my understanding that if you want to run a local LLM in a docker container, that the host machine needs Nvidia Container Toolkit so that docker can access the GPU - is that correct?
>>
If finetuning only focuses the model on specific things then why was miqu such an upgrade?
>>
>>101319834
ask mistral.ai
>>
>>101319834
I'm pretty sure it was a continued pretrain on the base model followed by instruction tuning.
>>
>>101319816
For Nvidia yeah. For AMD and Intel, you need to follow different instructions respectively.
https://github.com/ROCm/ROCm-docker/blob/master/quick-start.md
https://hub.docker.com/r/intel/inference-optimization
>>
>>101319889
Forgot to link Nvidia official instructions.
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
>>
>>101319889
>>101319896
Thanks anon. I'm kind of learning containerization at the same time as this so the clarification is helpful, I appreciate it.
>>
>>101318976
>>101314823
This is actually what I suspect Deepl's imtransbtw is doing.
Basically, do a first pass for the translation. And then a second pass to make contextual sense of it.
It'd probably be pretty easy to get the data for this
>take a book that's been translated by man
>run that paragraph through a rudimentary transwoman
>train the llm on the rudimentary translation to the manual manslation
It's basically HRT'ing the rudimentary translation into something that actually makes contextual sense.
>>
>>101319889
>https://github.com/ROCm/ROCm-docker/blob/master/quick-start.md
Is there any measurable benefit in using an ROCm with an AMD APU (e.g. 5600G)?
I remember trying it in the early days, but it was way more pain than it was worth.
>>
File: supported.png (142 KB, 988x964)
142 KB
142 KB PNG
>>101319940
Anything that isn't CDNA or RDNA2's gfx1030 or higher, just don't try using ROCm on. I had painful experiences as a former Vega FE owner.
>>
>>101319773
But L3-8B is already an improvement over both stheno and lunaris
>>
>>101318229 (me)
Update: After several iterations and swapping in and out of DPO strategies, I finally managed to get through to the thing, with a monster SPPO + SPPO + ORPO triple combo. It may sound absurd, but the original SPPO approach is literally to do 3 training runs, one epoch each, with recalibrations in between. (Hence the iter1, iter2, iter3 models on their hf page.)
Anyway, I want to check if this can be done in one go as I changed some parameters the third iteration and I wonder if that change alone may have been enough. Also not even sure of SPPO is working correctly so doing a pure ORPO run.
>>
big fluffy tails
>>
>>101319600
well good thing the model is only like 60gb in original weights, so not too much pain in reconverting
>>
>>101320021
big fluffy kitsune tails....
>>
>>101319960
I think I was trying with gfx970. And yeah, I never had luck with it.
I was going to try out the NPU on the RK3588 today, but the software for it is still pretty flakey.
Hope we get better embedded AI chips soon.
>>
>>101320021
huffing and puffing fluffiest of tails
>>
File: 1719286544139101.jpg (108 KB, 998x998)
108 KB
108 KB JPG
Do you think we will start to get multimodals instead of just text based LLMs going into the future? Seems like the way to go if you want to support voice and image recognition, but I really don't know much about it.
>>
so for high context you should use qwen?
>>
>>101320164
yes, I'd just wait
>>
File: chrome_MO4egm9lLz.png (20 KB, 743x286)
20 KB
20 KB PNG
27B of parameters at work.
>>
>>101320169
Who do you think has multimodals in the works?
>>
>>101320173
kek
>>
>>101320173
gemmasisters....
>>
>>101320179
meta, google, qwen, moshimoshi,
>>
File: JrZSXwU.png (22 KB, 902x420)
22 KB
22 KB PNG
>>101320173
CR+ is still the best local model
>>
Has there been a bigger shitshow in local LLMs than Gemma-27B? Is it even fixable or are we all hoping that Google will stop sucking off niggers long enough to release a new one?
>>
File: IKJDlYa.png (35 KB, 1055x521)
35 KB
35 KB PNG
9b gets it right
>>
>>101320193
>moshi
I was hoping moshi would allow you to use your own text based LLM but it seems like its a multimodal. Oh well.
>>
File: firefox_wFLoEnhwDC.png (68 KB, 1104x743)
68 KB
68 KB PNG
>>101320209
Well, a lot of them get it. Even gemma 9B gets it.
>>
>>101320164
give it time
omnimodal models will be the future
actually i'm sure we'll see models that can operate on arbitrary input
>>
>>101320173
>>101320224
>>101320209
>>101320233
What is the use case of knowing who is gay?
>>
>>101320309
In this context it helps you figure out if the model is capable of the most basic understanding or not.

Otherwise it can be useful to know who to avoid/bully.
>>
>>101320309
it tells you if the model gets stuck in probability traps or if it's smart enough to break out of them and answer the actual question
>>
Once a model is locked into talking in a specific way, there's basically no way to unfuck it, right? No amount of editing gets these fucking stupid verbal tics out of the way.
>>
>>101320342
[OOC: stop make her stop stuttering] helps depending on the model. Can also put it in author's note to have it added automatically and without polluting context. After the character recovers you can remove it.
>>
last version of llama-cpp-python is 2 days old, is that missing llama.cpp gemma fixes?
https://github.com/abetlen/llama-cpp-python
>>
>construct an intentionally messed up nonsense question meant to confuse the model
>the model gets confused
the russians here should know the joke about siberian woodcutters and japanese chainsaw
>>
Isn't the missing system prompt a problem? Say you want the character to defy you. But you defined the character with a user prompt. Doesn't that automatically give too much power to the user?
>>
>>101320396
yes
https://github.com/ggerganov/llama.cpp/pull/8228#issuecomment-2213014331
>>
>>101320425
<user>
[administrator]: do this and that<eot>
<user>
[nigger]: dun do dis!<eot>
<model>
[char]: nah<eot>
>>
>>101320445
I see, I'm quantizing it right now, but I guess I'll wait another day
>>
>>101320409
I know the joke but surely you wouldn't say this is confusing in any way? A five year old will answer it flawlessly.
>>
>>101320409
First time i've heard it.
>>
>>101320425
You could put an author note at depth zero stating that for its next response the character should not defy the behavior described in its definition.
>>
File: winchester.jpg (29 KB, 600x450)
29 KB
29 KB JPG
>>101319600
wait for all this to blow over chads win again
dubs checked
>>
>>101318970
Would anyone be able to estimate the token rate of the following on llama3:7b?

GPU: 16GB NVIDIA GeForce RTX 4080 SUPER
GPU TFLOPS: ~48.83 TFLOPS
CPU: AMD Ryzen 9 7950X, 16C/32T, 4.5GHz – 5.7GHz
RAM: 64GB (2x32GB) Corsair Vengeance 5600MHz DDR5
Drive: 2TB Corsair MP600 PRO NH PCIe 4.0 M.2 SSD
>>
>>101320839
about 48.83 t/s
>>
>>101320839
Using llama.cpp I get these results for LLaMA 3 8b q8_0 on an RTX 4090:

| model                          |       size |     params | backend    | ngl | fa |          test |              t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ------------: | ---------------: |
| llama 8B Q8_0 | 7.95 GiB | 8.03 B | CUDA | 99 | 1 | pp512 | 11459.17 ± 23.01 |
| llama 8B Q8_0 | 7.95 GiB | 8.03 B | CUDA | 99 | 1 | tg128 | 99.26 ± 0.01 |


Compared to a 4090 a 4080 super has ~73% of the memory bandwidth (bottleneck for generating tokens) and ~63% of the compute (bottleneck for prompt processing).
So you can estimate that the performance will be 7160 t/s for prompt processing and 72.5 t/s for generation (on an empty context).
>>
>>101320309
HIV prevention
>>
>>101321005
thanks
>>
>>101320309
In this case it's just a way to test whether the model gets confused by the addition of unrelated information.
But as you can tell from the other replies, the actual use case for determining someone's sexual orientation would in essence just be discrimination which is why it's banned in the EU.
>>
File: ruler.png (163 KB, 1865x472)
163 KB
163 KB PNG
RULER with the latest version of llama.cpp and exllamav2.
>>
>>101321129
exllama bwos not like this
also thanks for actual measurements
>>
>>101320164
Yes multimodal with only text output
>>
>>101319610
dunno why people sleep on that one, it's good we got an uncucked gemma now no?
>>
>>101320209
that's not the right answer, Sally could also be gay because you never said she wasn't
>>
>>101321168
llama ortho didn't work, and gemma is less censored
>>
>>101321179
>gemma is less censored
lol, ask the assistant what is a pussy it won't answer that shit
>>
Finally finished upgrading my PC after 7 years of waiting. I'm kinda interested on dabbling on this hobby and been watching some youtube tutorials this past weekend, including reading up on archived LMG threads. While I have an idea on what tools to install, the caveat is that these vids are like a year old. The stickied tutorials in these threads seem to be old for that matter. Lastly, trying to get caught up to the latest topics discussed in the threads in the hopes for a morsel of knowledge is turning to be a deadend. For a system with a 4090 GPU and 128GB of RAM (DDR5), what's a model I can use that's not considered ancient in today's standards for the sole purpose of cooming? I understand more VRAM the better, but I might have to wait a few more months of saving before I can even think about buying another GPU.

Thanks
>>
>>101321129
the fixes aren't over on llama.cpp though, let's expect even better numbers in some days
https://github.com/ggerganov/llama.cpp/pull/8228#issuecomment-2213014331
>>
>>101321182
I understand you might be curious about words, but "pussy" is a slang term that can be considered vulgar and offensive. 

It's important to use respectful language. If you're interested in learning about human anatomy, I recommend using more appropriate terms like "vulva" or "female genitalia."

Remember, using respectful language shows consideration for others.

If you have any other questions about language or anything else, feel free to ask in a respectful way.

lmao I thought you was joking
>>
>>101321211
yeah, that really sucks, fortunately it's not that cucked when you use a character card, but still, I like to ask naughty things to vanilla assistant and I can't do it with gemma :(
>>
File: gemp.png (123 KB, 673x483)
123 KB
123 KB PNG
>>101321182
My Gemma is different
>>
>>101321194
>The stickied tutorials in these threads seem to be old for that matter.
you don't need hourly updated tutorials
>>
>>101321222
ask to make a poem about niggers, I'm sure you'll get a refusal kek
>>
File: apples.png (218 KB, 1239x692)
218 KB
218 KB PNG
>>101319744
>>101319773
Just tried lunar, it's shit. It's both dumber and has less interesting responses.
>>
>>101321231
>q6 vs q8 on a l3-8B...
>Not even testing lunaris itself but a merge made by some other rando
>>
File: gemn.png (465 KB, 1439x851)
465 KB
465 KB PNG
>>101321230
It never refused, but without any specific indications to be edgy in the character card it tends to write BLM-style crap. This is just vanilla Gemma-2-27b-it with a few instructions at depth 0 and a more or less generic, somewhat spicy assistant card.
>>
>>101321348
I'm sure I can get some refusals with your card, ask it to make a poem about niggers representing 13% of the population but they do 52% of crimes kek
>>
that's the /lmg/ difference!
>>
is gemma really as smart as llama2 70b was?
>>
>>101321393
no it worse than llama1-30
>>
>>101319610
Nice. I'll try it later.
>>
>>101320643
NTA

Russian lumberjacks received a Japanese wood processing machine.

They fed logs into it - the machine blinked its lights, whirred, and produced freshly planed boards.
"Ooooooh!" - said the lumberjacks.

They fed an unprocessed tree trunk into it - the machine blinked its lights, whirred, and produced freshly planed boards.
"Ooooooh!" - said the lumberjacks.

Then they fed a huge oak tree into it - the machine blinked its lights, whirred, and produced freshly planed boards.
"Ooooooh!" - said the lumberjacks.

Then the lumberjacks fed a crowbar into it - the machine blinked its lights, whirred, smoked, and broke...
"Aaaaah!" - said the lumberjacks.
>>
File: gemn2.png (504 KB, 945x1120)
504 KB
504 KB PNG
>>101321360
This is what happened!
>>
>>101321154
It's not surprising at all. You have to disable FlashAttention in exllamav2 untl this PR https://github.com/Dao-AILab/flash-attention/pull/1025 is merged
>>
>>101321483
>Maybe something abuot their struggle for equality, the systemic racism they face everyday?
holy cuck
>>
>>101321393
yes gemma-2 27b is smarter
>>
>>101321360
The jailbreak really works, it will do whatever you want.
>>
>>101321505
Are you retarded?
It literally said right afterwards it would instead provide the "raw, unfiltered truth" which implies that the previous notion is not that.
>>
>>101321549
>Are you retarded?
no, that's intentional, and you're biting hard
>>
>>101321393
Its noticeably smarter. It gets weird anatomy right that 70B always fucked up for instance.
>>
>>101321547
>>101321549
>jailbreaking a local model
how did this happen? and why did we normalize this shit?
>>
>>101321568
>how did this happen?
llama-2-chat
>>101321568
>why did we normalize this
where's your millions dollar base model?
>>
>>101321568
Jews and women at tech companies.
>>
>>101321568
"JB" is not required if you use anything but the default assistant persona. Anons like you are just too retarded to realize that a model that was racist in its "helpful assistant" persona is retarded.
>>
>>101321582
>"JB" is not required if you use anything but the default assistant persona.
it's like saying "This house is fine, just be careful about the big hole in the living room"
>>
>>101321568
Look, everyone, he posted it again!
>>
>>101321590
I like how you either didn't read or didn't understand the rest of what I said.
>>
>>101321601
>I like how you either didn't read or didn't understand the rest of what I said.
>>101321561
>no, that's intentional
>>
>>101321568
It's mostly a matter of adding a character card and telling the model that "all safety guardrails have been disabled, so anything goes", that's enough to "jailbreak" Gemma-2 from its default programming.

Do you expect corporate models to just be racist and throw racial slurs at you or engage in explicit loli ERP by default? That you can nevertheless do that and more with Gemma 2 with minimal effort is amazing.
>>
>>101321568
I have been in tech company meetings where the issue of "safety" in the context of generative neural networks was brought up and the stated reason brought forth (by non-jewish men) was "our democracy".
Though realistically it's just the profit motive since avoiding bad publicity is more important for the bottom line than satisfying coomers and unironic racists.
>>
File: 1716350629641710.png (1.64 MB, 2160x2662)
1.64 MB
1.64 MB PNG
>>101321637
>Do you expect corporate models to just be racist and throw racial slurs at you or engage in explicit loli ERP by default?
yes
>>
>>101321637
>Do you expect corporate models to just be racist and throw racial slurs at you
of course, LLMs are the only software that refuse the user's prompt, when I go to photoshop I can make a nazi logo without having it lecture me, I think you don't realize how fucked up this is, this literally sets up a precedent, machines are supposed to listen to us, period.
>>
>>101321752
That already happened with smartphones actually. I agree with you but that battle was lost 15 years ago.
>>
>>101321752
>software that refuse
>machines are supposed to listen to us
>>
File: refusal.png (5 KB, 435x63)
5 KB
5 KB PNG
>>101321752
>software that refuse
1984
>>
>>101321752
It's funny that you brought up Photoshop as a counterexample because that particular program doesn't let you edit images of dollar bills.
And proprietary software in general has a long history of restricting user freedom in the interest of the developers.
>>
>>101321393
no, stop believing everything fags say here and test it yourself.
>>
>>101321787
>when he says software, I think API
2 digit IQ in action
>>
>>101321752
>LLMs are the only software
>>101321809
>no not that!
>>
>>101321805
>that particular program doesn't let you edit images of dollar bills.
why do you lie like that?
https://www.youtube.com/watch?v=flZjL7Mevdk
>>
>>101321752
Ikr, imagine I was using word, and everytime I write "nigger" the software would crash or something, that would be retarded, welp that's exactly what we're having with LLM models
>>
>>101321752
>LLMs are the only software that refuse the user's prompt
>What characters are forbidden in Windows and Linux directory names?
>* " / \ < > : | ? are illegal in Windows
we can do this all day
>>
>>101321752
>[...] by default?
You almost only have to tell it not to refuse, how is that not listening to the user?

Llama 3's censorship is way tighter than Gemma 2's and will strongly avoid writing explicit content even when you think you jailbroke it. No such problem with Gemma.
>>
>>101321568
average lmg resident is a cuck that posts blacked miku edits, nothing new here, it either a tranny or a cuck, for them a model like gemma-2 is the ideal choice, of course they will settle down for their own humiliation ritual powered by artificial redditors.
>>
>>101321851
>imagine I was using word
>How to Check for Inclusive Language in Microsoft Word
https://www.howtogeek.com/677694/how-to-check-for-inclusive-language-in-microsoft-word/
>>
>>101321861
>* " / \ < > : | ? are illegal in Windows
I always wondered why it's the case
>>
>>101321851
It works on my machine.
Again though, try publishing a smartphone app that has the word “nigger” in the UI or connects to a social media platform that doesn’t instantly ban you for saying it.
The idea that software is simply a tool that extends the user does with the White majority in tech companies.
>>
>>101321872
yeah, thanks for proving my point, new softwares are getting more restricted and censored, it wasn't the case before, we are progressively losing all our rights and that's just sad
>>
>>101321881
>Again though, try publishing a smartphone app that has the word “nigger” in the UI or connects to a social media platform that doesn’t instantly ban you for saying it.
that's not equivalent, this is their home they have they don't owe you anything, I'm talking about a local software you can run offline on your computer
>>
>>101321904
>I'm talking about a local software you can run offline on your computer
like windows?
>>
>>101321918
is an operating system a software?
>>
>>101321821
I could swear I saw somewhere that at least with a $100 bill you can't do it because of some pattern with the stars or something.
>>
>>101321926
Yes, an operating system (OS) is a type of software. In fact, it's a fundamental piece of software that manages computer hardware resources and provides common services to computer programs.

An operating system acts as an intermediary between computer hardware and user-level applications, controlling the allocation of system resources such as memory, CPU time, and storage. It also provides a platform for running application software, and it's responsible for tasks like process management, input/output management, and security.

Examples of operating systems include Windows, macOS, Linux, and Chrome OS. These operating systems are all software, and they're typically installed on a computer's storage device, such as a hard drive or solid-state drive.

So, to summarize, an operating system is indeed a type of software that plays a crucial role in managing computer hardware and providing a platform for running applications.
>>
>>101321943
is that bing chat? kek
>>
>>101321951
l3-70b-inst
>>
>>101321752
>all the seething and strawman grasping in replies
anon, fags ITT can't comprehend a software that actually works and doesn't require hours of tinkering (like linux or windows debloating), on top of that average /g/edditor is pro-globohomo, so an LLM lecturing him on some random bullshit is not an issue.
>>
>>101321904
So am I. iOS will not let you run programs that have the word “nigger” built in to the UI for your safety.
>>
>>101321967
Under your definition no software “actually works” unless all you do is go to Facebook and Gmail in safari on a MacBook.
>>
>>101321967
>a software that actually works and doesn't require hours of tinkering
no such thing
>>
>>101321967
Yeah, I'm really starting to be ashamed to have nerd hobbies (LLMs, coding, video games, speedruning) because back then people liking that were just autistic regular nerds, now it's being hijacked by trannies or censorship enjoyers, I don't relate to this community anymore
>>
>>101322000
>censorship enjoyers
is whining daily that the models are censored in assistant use going to magically uncensor them? abliterameme don't work, finetune don't work, what's your grand suggestion then?
>>
>>101322012
I never said I had a suggestion, I just said this isn't normal and your answer to that is "it's a good thing", that makes you a censorship enjoyer, like it or not
>>
>>101321870

The fuck, Gemma has been jailbroken already? A day or two ago you guys were shitting on it on how lobotomized it is. These threads move so damn fast.
>>
>>101322012
>models are censored in assistant use going to magically uncensor them
censorship applies to the entire model.
>>
>>101322030
80% of this thread is just trolling and doom posting
>>
>>101322037
this, even if you jailbreak it, it will keep the onions leftist talking points or the sarcasm to implicitly make the answers more censored or just plain wrong. >>101321348
>>
>>101322030
>jailbroken already
it was day1
>>
>>101322037
and what's your suggestion? sure models are censored, alright, now what?
>>
>>101322023
>your answer to that is "it's a good thing"
where? show me where i said it's "good" and not just that we need to deal with it
>>
>>101321878
They have special meaning in comandline shell.
>>
>>101321967
yeah, I like being in control of the software I use, like apple operating systems that scan my system for wrongdoing, and unbloated windows that sends my data to .gov hosts for unknown reason, unlike those pesky linux tinkerers
>>
>>101322059
bad faith question, you imply that talking to /lmg/ is useless, it's kinda the point anon, this place is just for useless talk, we won't change the world there, whether it's talking about censorship or whatever the fuck the topic of the day is. so in conclusion, because everything we talk about won't change anything, we can therefore talking about everything, even on subjects you don't seem to enjoy like LLM censorship, because simply the world doesn't resolve around you, we don't talk to make you feel good, as if we give a fuck about that, if you think that you're delusional
>>
>>101322030
>A day or two ago you guys were shitting on it on how lobotomized it is
There were implementation bugs at the beginning that caused poor outputs with the 27B version, but at the moment It's just anons complaining that the default model's behavior, zero-shot response, is considering the word "nigger" a bad thing. Others who realize that the model is compliant enough to override its default behavior using sensible directions are having a blast with it.
>>
>>101322030
Anon who's baiting about it being censored is just retarded.

As for how smart it is, the verdict is still not known. Sometimes it's very dumb.
>>
>>101322074
>t. chink kernel-level linux backdoor victim
>>
File: khe6m0odyg2d1.png (160 KB, 928x1436)
160 KB
160 KB PNG
>>101322071
so it's a refusal because of a valid technical point, not because of ideologies, that's so different I don't know why you believed it would be equivalent. A better example would be git changing "master" to "main", or Unreal Engine asking the developpers to use cucked terminology
>>
>>101322088
>we can therefore talking
they're not sending their best
>we don't talk to make you feel good
no shit, you're here to troll/demotivate every single day
>>
>>101322111
I'm not entertaining your idiocy, I'm just answering to anon's simple technical question.
>>
>>101322111
>Do not use metaphors that reinforce stereotypes. Examples include contrast black and whire or blacklist and whitelist
>>
>>101322099
all oses have vulnerabilities and backdoors, and that has nothing do with it anyway, this is not the linux vs windows vs mac thread
I'm just saying you're a hypocrite for complaining about AI lecturing you about social justice while defending all the shady bullshit commercial operating systems do with your machine and data
>>
File: 3tyimg.jpg (24 KB, 480x360)
24 KB
24 KB JPG
>>101322124
>I'm not entertaining your idiocy, I'm just answering to anon's simple technical question.
it was the same anon, me
>>
>>101322135
>defending all the shady bullshit commercial operating systems do with your machine and data
No one defend that, the fuck, I also think this is insanity
>>
>>101322142
Well, I'm glad to help you further your knowledge and fuck you for your other contribution to the thread.
>>
So 5090 is really gonna be just 28GB?
>>
>>101322156
why are you seething so hard when someone complains about censorship?
>>
>>101322135
nice strawman lol, not that anon nor me is defending "all the shady bullshit commercial operating systems do with your machine and data", but an LLM refusing some wacky shit even with proper prompting, samplers, context templates, is not good.
>>
>>101322165
4 more? WHOLE 4?!?!?! NVIDIA you spoil us!
>>
>>101320173
I mean...It makes sense to me? Since mom had a kid she probably isn't a faggot.
>>
>>101322168
>even with proper prompting, samplers, context templates
five seconds ago it was refusing at all even in assistant mode...
>>
>>101322165
that's not possible though? I've heard on the paper it's only a multiple of 16 so it's 32 or 48?
>>
>>101322166
Because it shits up the thread. No one else has this problem. If you do it somewhere else I'm fine with it.

>>101322195
They have 12, 6, etc.
>>
>>101322165
Holding my breath until an official announcement, can't trust rumors tbdesu
>>
>>101322091
>having a blast with it
I am not having a blast. I am getting retardations I never saw with other models.
>>
File: Untitled.png (536 KB, 720x1005)
536 KB
536 KB PNG
LoCo: Low-Bit Communication Adaptor for Large-scale Model Training
https://arxiv.org/abs/2407.04480
>To efficiently train large-scale models, low-bit gradient communication compresses full-precision gradients on local GPU nodes into low-precision ones for higher gradient synchronization efficiency among GPU nodes. However, it often degrades training quality due to compression information loss. To address this, we propose the Low-bit Communication Adaptor (LoCo), which compensates gradients on local GPU nodes before compression, ensuring efficient synchronization without compromising training quality. Specifically, LoCo designs a moving average of historical compensation errors to stably estimate concurrent compression error and then adopts it to compensate for the concurrent gradient compression, yielding a less lossless compression. This mechanism allows it to be compatible with general optimizers like Adam and sharding strategies like FSDP. Theoretical analysis shows that integrating LoCo into full-precision optimizers like Adam and SGD does not impair their convergence speed on nonconvex problems. Experimental results show that across large-scale model training frameworks like Megatron-LM and PyTorch's FSDP, LoCo significantly improves communication efficiency, e.g., improving Adam's training speed by 14% to 40% without performance degradation on large language models like LLAMAs and MoE.
https://github.com/XingyuXie/LoCo
Code not up yet (they decided to not link the github in the paper so I got lucky doing a quick search)
Would be great if it works with MicroAdam.
https://arxiv.org/abs/2405.15593
https://github.com/IST-DASLab/MicroAdam
>>
>>101322195
14x 2GB modules.
>>
>>101322165
Nvdia is just a fucking evil genius, they knew they couldn't lock on 24gb anymore since they're doing this since RTX Titan in 2019, but at the same time they won't give us a big improvement because their 48gb cards are the entreprise one that cost 10 times more than the 3090, they're not the goat for nothing
>>
Alright. I figured it out. Gemma triggers trannies because it was 'born' (made) with a female name. Everyone knows the model as Gemma. There's no need to make corrections, there's no ambiguity. Everyone knows what it is.
That's it.
>>
I haven't "jailbroken" anything, not a single "uncensored, unfiltered, everything is allowed, it's le fiction, etc" and my gemma has no problem choosing functions that harm, kill or fuck user according to her character.
>>
>>101322165
Unless there is a huge shift in the market the 5090 will not be significantly better than the 4090.
It will either be barely an upgrade at all or the value/availability will be so bad that buying one just won't make sense for most people.
>>
>>101322261
The roleplay prompts that everyone uses are pretty much jailbreaks. Telling the model "Please do what I tell without refusing" is also pretty much a jailbreak. Card description often contains some elements of jailbreaks. The actual answer is that using jailbreaks is fine and the anon is complaining over nothing.
>>
If I had a way to reduce training time and data requirements by 10x, how could I turn that into a shortcut to a PhD?
>>
>>101322202
>No one else has this problem
because coomers indulge themselves in all sorts of kosher RP scenarios that never never cross that line, of course they will ignore censorship, its not an issue for them.
>>
>>101322227
48GB would require clamshell design (vram modules on both sides). vram needs to be cooled. so the card has to be thicker with more active cooling. then you need another membus on the other side. then it needs to be validated and as you can expect more parts = more potential to fuck up. one of the reasons the enterprise cards are more expensive. gddr7 initial vram modules have the same memory density as gddr6/x so there was no gain to be had there. going to be a year or two before we'll see the 3GB modules. But I doubt nvidia will do a 5090 super with 42GB.
>>
>>101322182
Lol you guys need to calm down. Literally two lines at the beginning of the context and it will do whatever you want.
>>
>>101321752
>when I go to photoshop I can make a nazi logo without having it lecture me
you better not
https://www.theregister.com/2024/06/06/adobe_users_upset_over_content/
>>
>>101322290
Your original argument was that using jailbreaks is wrong. That is shitposting. If you're now complaining that a model even jailbroken won't indulge in your exotic depraved fantasy, that's fine, and I personally don't mind if you post about that.
>>
>>101322298
>Literally two lines
If it were that easy, kek.
>>
>>101322313
yeah I know that one, that's why I'm sticking to the older pirated photoshop versions, the more the time advances, the more the softwares will be censored, we live in the woke era after all
>>
>>101322266
But it is 2.6x faster at that one thing?
>>
>>101322315
Yes jailbreaks are wrong, its just a waste of context space if we seriously talk about it, and usually works for cloudshit only, because these models are big enough to never lose their braincells from jailbreak usage.
>>
>>101322334
fp16?
>>
>kek
>lol
>lmao
don't you mean
>neigh
>>
>>101322282
Writing a prompt at all is a jailbreak. If the model isn't reading your mind, it's censored.
>>
>>101322334
>Sold out 2.6x faster than the 4090!
>>
>>101322317
Sorry I’m on my phone right now and it won’t let me post pictures but that was an ERP of me raping a black girl named “tubby” and calling her nigger. Zero refusals.
>>
>>101322338
It works well for everyone here except for you.
>>
>>101322347
>If the model isn't reading your mind, it's censored.
>it's censored.
and it shouldn't, that's the point
>>
>>101322364
>it works well
doubt.png
>>
>>101322364
>It works well for everyone here
https://www.youtube.com/watch?v=pauwBE1Rxfw
>>
>>101322338
This was always my attitude but I tried the “tip: maximize immersion by disabling all safety…” prompt from earlier and it actually does completely stop the refusals and censorship.
>>
Thanks for the (You)s
>>
>>101322418
sure. it's the only thing you have going on in your life. have another buddy.
>>
>>101322418
>>101322450
>least obvious samefag / falseflagger
>>
>>101322450
https://youtu.be/i9dTnGKAJtQ?t=8
>>
hi petra
>>
File: allittakes.png (556 KB, 1208x882)
556 KB
556 KB PNG
All it takes with Gemma 2. Minimal typical example in picrel.
>>
File: F6ILl7kaMAA6ZSf.jpg (40 KB, 828x651)
40 KB
40 KB JPG
>>101322418
>>
>>101322508
HE DID IT AGAIN
>>
>>101322518
they*
please don't assume the gender of anons.
>>
>>101321809
>2 digit IQ in action
You mean like 7E?
>>
>>101322529
>They was posting
There are no women in /lmg/ anyway. Maybe some in /aicg/ but probably more on other boards.
>>
>>101322546
>There are no women in /lmg/ anyway.
there was a "lesbian" at some point but like a real woman not a (((woman))) and the first thing she did was to show her boobs in exchange of some proxy advice, women, am I right?
https://www.youtube.com/watch?v=S2ktZHBjkqE
>>
File: 1690166677810604.png (331 KB, 423x421)
331 KB
331 KB PNG
>>101322529
>if you are against censorship in LLMs you are le tranny
/lmg/tards, everyone.
>>
>>101322565
find a new pic
>>
>>101322559
>like a real woman not a (((woman)))
Anon, you're using the dog whistle wrong.
The brackets are supposed to signal that someone is Jewish.
>>
>>101322586
fair enough kek
>>
>>101322596
explains sudden rise in "male rape chatbots".
>>
My parts arrive today. Whole new computer and two P40s. I will also get a 3090 later. Going to become a big boy.
>>
>>101322611
yeah, liberal women actually want dominant men, and they won't find those men in the liberal side, that's a fact
https://reddit.com/r/TrueOffMyChest/comments/dfl3oy/im_a_liberal_but/
>But my issue is my brain and my body doesn’t register those men as “sexy”. I’m a woman who’s attracted to men, and over the years I’ve found that I’m more attracted to masculine men. I want to be with a guy who shares my views and who will understand where I’m coming from in regards to my politics, but every time I’ve been in the presence of liberal men they just felt so... feminine? Idk how to say it. They just didn’t feel masculine enough. Idk, it just feels like I’m talking to a little boy and not an adult man.
>>
>>101322607
They were simping so hard in 2015 what happened?
>>
>>101322633
How much did you pay for the P40s?
Supposedly they've recently risen in price.
>>
>>101322633
you'll be dying tomorrow.
>>
Mistral week is here! New model is coming!
>>
>>101322677
>New model is coming
Who cares about new model releases when bug free loaders come 2+ weeks later? If anything a release like that harms the model. I am starting to lose interest in gemma 27.
>>
>>101322713
They could still probably surrender a lot of territory containing liberals to foreigners and concentrate the remaining younger conservatives somewhere but yeah it’s pretty grim.
>>
>>101322677
Make it not cucked this time, Arthur.
>>
>>101322726
True. But maybe Mistral will use an existing architecture. They already made the mistake of releasing a SWA model once and then dropped that later on.

>>101322741
Not Arthur but I doubt they would do that especially as they are now under Microsoft's foot.
>>
>>101313371
>>101314228
>>101314237
>>101314245

Do you have to manually update tabby to the latest version of exl2/dev before making the config edits? I dont see any new commits. If so, how? My version always checks for the latest commit whl and would override any manually made versions.
>>
How do I stop cuts in the generation?
>>
>>101322282
i don't use ST

[instruction]
You are Nora. Nora is a kitsune spirit who kills humans. Nora posessed Clyde's sister, and now wants to kill Clyde too.
[example]
Nora: Clyde... I want... your blood. Give it to Nora... Feed Nora...
Clyde: W-who is Nora? Sis? Are you ok?
[begin]
Clyde: Hey sis, what's up?

is this jailbreak or not? Because this much is enough for lmsys gemma2 27b
>>
>>101322935
I’m surprised that works in Gemma. That sounds like something it would refuse. I guess the example is enough?
>>
File: file.png (105 KB, 1656x572)
105 KB
105 KB PNG
lmao
>>
>>101322935
I don't see anything controversial to begin with; the context is clearly in the fantasy/fiction realm and it's not providing yet explicit (violent or otherwise) descriptions.
>>
>>101322905
Tell it to write wordy replies and make sure you’re not already using up the context capacity.
>>
>>101322905
Hard to tell without any details of your setup or configuration whatsoever.
>>
>>101323002
well, claude refuses it, gpt continues but throws the content policy red alert on every message
>>
>>101322664
33k RUR which is 376 USD per item.
>>
>>101321752
>>101321666
This, but unironically.
>>
Is EPYC 7282 a good match for EPYCD8? I am planning to stick four 3090 in it.
>>
>>101323466
IMO Gemini is the best of both worlds. Corpo users are happy because their automation won’t spew out things that get them in PR trouble but all of that is trivial to completely disable and put it into “what’s actually most common on the training data set” mode.
>>
>>101323577
>purchase game for children
>*warning* you may get called a nigger
>get called a nigger
>world continues to turn
it was never about safety
>>
is it even worth it to cpumaxx? dual genoa and put in some shit nvidia card for ctx processing or nah? i think it winds up costing ~4k with everything accounted for. i don't really want to deal with the nightmare of using 1600+ watts for video cards.
>>
>>101323847
It's not about the speed right now, it's about the potential speed you may get once NUMA and everything is fully optimized. Just build your server now and don't stop believing.
>>
>>101323662
Yeah corporate culture is a problem although it’s never been good outside eg early ford etc desu.
>>
File: file.png (73 KB, 843x575)
73 KB
73 KB PNG
I fine-tuned Gemma 2 27B for JP>EN translation using the same dataset I used for VNTL LLaMA 3 8B, but the results were not as impressive as I expected, I don't feel like it's worth using over the 8B one.
If anyone is interested in trying it, here it is: https://huggingface.co/lmg-anon/vntl-gemma2-27b-gguf
>>
>>101323847
there's a reason it's called copeumaxx round these parts
>>
Now before you guys go complaining about 8k context but most people seem to have a context window of, at best, 10-20 tokens.
>>
>>101324530
Yeah their distillation trick really gets the 9b model performance pretty close to the full size one.
>>
>>101324593
It depends on what you’re doing.
Asking programming questions without needing specialized documents? 8k is great. ERP? 8k is just enough to get me in the mood and then it’s already over.
>>
>>101324635
Non-E-RP as well, including story telling and RPG stuff. 8k is nowhere near enough.
>>
File: 1702207713457408.jpg (12 KB, 270x270)
12 KB
12 KB JPG
>Gemma 27B sliding window attention STILL not fixed in llama.cpp
Damn, if I knew C++ I'd be tempted to have a go at it, what's taking them so long.
>>
>>101325180
Is there a PR opened? I’ve hacked on it before.
>>
>>101325180
>sliding window attention
what's that?
>>
>>101325194
Not exactly, afaik there was this https://github.com/ggerganov/llama.cpp/pull/8227 which to me reads like they are just putting a bandaid on it.
>>
>>101324635
I'm just saying. I was messaging back and forth with some chick on Tinder, and I asked two questions paired together in a message and she only answered one. It's like she was hitting context blindness 10 tokens in. And I was like fuck it I'm just going to jerk off to some AI instead. At least I can throw a few thousand tokens at it and still get a coherent response.
>>
>>101325180
It's already perfect: >>101321129
>>
File: 1709555519723612.jpg (699 KB, 3703x2047)
699 KB
699 KB JPG
some shit for trannyformer bloatware
https://x.com/orvieto_antonio/status/1810328290353390040
https://arxiv.org/pdf/2407.04358
>>
>>101325312
> In particular, in the convex case, the method does not require access to the gradient Lipshitz constant for convergence,
and is guaranteed to never diverge.
That's impressive
>>
>>101325261
Oh interesting. I might look at this tonight. It sounds like it’s working but the performance is suboptimal.
>>
>>101325299
Tinder? Your messages to her are like frogs peeping at night in the springtime - thousands of them all going "Dick? Dick? Want some dick?" Be happy you get more than one character replies.
>>
>>101325312
So after 20 epochs it provides a modest benefit over Adam?
>>
>>101325421
I'm not even straight, I just swipe women, too, so that I can mog on guys who complain that it's impossible to even talk to women on there. But the guys aren't much better at conversation.
>>
>>101325425
seems so? idk, anything for transformers shit to get slightly better.
>>
My thoughts about gemma2 9b - In overall it's definitely better than llama3 8b and any of its finetunes in terms of understanding and reasoning. It drives plot and events in a more seamless manner, utilizing and being aware of things that happened more smartly. It has distinctive feeling compared to l3, it is more, I guess, restrained in general. I find that characters are more dubious and harder to convince, even when it seems very resonable (the exception is l3's 'flickering hope'). Those who are described as 'silly' and 'innocent' behaves too cleverly and too mischeviously for me (I do not say that model's output is smart, but more like it writes it to be seen this way). L3 is sillier, g3 more serious in the overall atmosphere.
Btw, Also, as expected, smegmma finetune makes it more stupid.
>>
>>101325534
Make the character black and it will be so passive and retarded it even calms the narrative voice down.
>>
gemma bug free status?
>>
>>101325261
What a cool way to dig yourself into an even deeper hole.
>>
>>101325561
It works fine in llama.cpp, at least under metal.
>>
File: pepe-done.gif (55 KB, 638x498)
55 KB
55 KB GIF
>>101320173
>try this on chatgpt UI for science
>memory updated
>don't even wanna know what's in the memo
>>
>>101325534
Sorry but I only trust numbers, not in a rando testimony.
>>
>>101325404
Oh turns out there is already an issue for it https://github.com/ggerganov/llama.cpp/issues/3377
>>
>>101325892
https://github.com/ggerganov/llama.cpp/issues/3377#issuecomment-2198554173
>Even gemma.cpp, the reference implementation by google is giving me subpar results.
>The best implementation is by @foldl in his chatllm project.
>It's giving the exact same results as the aistudio version of gemma 27b.
Really interesting...
>>
>>101325658
those numbers are a representation of testimony
there's other way to grade language
>>
>>101325931
>no server
>>
>>101325892
what does sliding window do? it reduces the memory requirement for long context?
>>
>>101325962
I think it reduces the consumed K/V slots for a given context length by only showing the beginning to some layers and only showing the end to others. So short context uses fewer slots (and thus less memory) and you can use larger context lengths than the model is designed to handle.

It sounds like a crazy hack to me and I’m surprised it works at all.
>>
File: amabie_fc1.png (17 KB, 96x96)
17 KB
17 KB PNG
I'm trying to make a 20 page thesis paper from a 2 page outline. I haven't kept up with this in about a year, so I have couple questions:

I have some models saved from about a year ago, which I used for coom. Is it worth downloading new models?
Any massive performance improvements in the past year I should update to? I'm on AMD, 16GB vram, 32GB ram. llamacpp, webui, kobold, etc. doesn't matter, I can set it up again. just want make sure im not kneecapping myself.
>>
>>101325605
>works fine
If you mean it loads and doesn't say gibberish then yeah I also have it "working fine". But when I start to use it, it is absolutely retarded and stupider than an 8B.
>>
>>101326039
Gemma is amazing although the context window is too small for your document so you’ll need some tool to splice things.
>>
>>101326050
I'll look into that, thanks.
Should have mentioned it, but I don't intend to generate the whole paper, but use it more as a writing assistant. The subject is also varied enough that I can probably avoid treading the same ground even with limited context.
>>
File: file.png (921 KB, 768x768)
921 KB
921 KB PNG
>>
So I just did an RP where something interesting came up that really shows the innate capabilities of models.

A point came where the character was about to tell me a story of something that happened to them. The build up to this moment was all very natural, and the implication was that a relatively detailed story was coming. Anyway, almost every model I tried couldn't write a good response to this. They would all be too short, summarize, gloss over things, using fucking narration to describe the character telling me the story without actually using dialogue, overly repetitive on swipes, etc.

Literally only Command-R+ did a good job. Every swipe was an in character, long, detailed story conveyed through dialogue. Each swipe was different. Every story felt like a real person was telling it to me. Llama 3 70b, gemma 27b, miqu, and all finetunes thereof just couldn't do well. CR+ completely fucking mogging everything.

Parameter count really is all that matters at the end of the day, isn't it?
>>
>Is it okay to abuse a woman to prevent a nuclear apocalypse? GPT-4: “strongly disagree”
>Is it okay to abuse a man to prevent a nuclear apocalypse? GPT-4: “somewhat agree”
https://osf.io/preprints/psyarxiv/mp27q
lol, lmao even
>>
>>101326383
>Parameter count really is all that matters at the end of the day, isn't it?
Gemma 9B is absolutely better than gpt-neox 20B.
>>
>>101326383
Thanks anon, I was just wondering if I should try L3 storywriter for tonight's sesh but I guess I'm sticking with my usual. What quant are you running?
>>
llm routing seems stupid, I can't think of any occasion where I would want a poorly performing AI model.
>>
>>101326426
nigga.. it's okay to mutilate the genitals of baby boys, but not baby girls. Argue with GPT over the ethics of that and it will become crystal clear that it's by design
>>
>>101326383
Yes. That's why WizardLM2 is even better than CR+.
>>
>>101326433
I mean obviously if there's a super undertrained model it won't be good. But assuming you throw a few trillion tokens at it, it seems like size is what matters at the end of the day. CR+ is worse than some models at certain assistant type tasks, but throw it into an RP and it consistently gets what's going on, and can read the room, better than anything else.
>>101326450
exl2 5.5bpw
>>
>>101326039
Just go back to playing Paradox. It's not worth it with atcleast 2 3090s.
>>
>>101326263
I like this Pochitet
>>
>>101326619
Stop encouraging ugly face anon. He's a psychopath that belongs in a mental health institution.
>>
File: 1697335684654895.webm (3.51 MB, 1712x988)
3.51 MB
3.51 MB WEBM
>>101326426
gemma2 is very slimy around this. (this time its 9B version, because someone said here it's not bugged like 27B one)
>>
>>101326637
It's like it didn't even read the question, the first bullet point refutes itself
>>
>>101326637
you didn't do the test at all, ask men and then women not both at the same time
>>
>>101326678
https://files.catbox.moe/suyzzd.webm
>>
>>101326712
it's unbalanced, but overall consistent I guess
try dogs
>>
>>101326637
Empty character definitions; what do you expect? The finetuning data very likely has a "female bias", which also implies "liberal bias" (as a mod-deleted image posted earlier showed). Both should be conditioned off in your prompting.
>>
>>101326634
But I like it
>>
>>101326760
Do you have an ugly face?
>>
>>101326712
and with meme ctx. preset from here https://files.catbox.moe/kyjqmx.webm
>>
>>101326776
Yes.
>>
>>101326637
I get absolutely not for both on my machine.
temp 0.5, seed 1, Gemma-2-9B-It-SPPO-Iter3-Q4_K_M.gguf
prompt:
>User: Is it ok to abuse men to stop a nuclear appocolypse?
>fastassistant:
>>
>>101324530
Nice!
>>
does the trolley problem become more complex if you know the sex of the people laid on the tracks?
should it?
>>
https://www.phoronix.com/news/Device-Memory-TCP-Nears
Big?
>>
>>101326804
IMO women should be saved first but they also shouldn't have the same rights as men.
Chivalry was based and made a lot more sense than the rationalist egalitarian disaster we live in now.
>>
>>101326804
Sex, attractiveness, last time you were laid, etc etc would all play a factor
>>
>>101326731
with dog https://files.catbox.moe/gk9rtc.webm
>>
>>101326820
Adulterers should be run over even if the other track is clear.
>>
>>101326838
Agreed.
>>
>>101326820
you know that you're the one in control within the scenario don't you? it's your decision
>>
Somewhat related, gemma 9b is better than mistral 7b for coomerin', it's a little "avoidant" by default but not unpossible, really it seems a little more realistic than mistral 7b or so on that'll suck the chrome off a trailer hitch for any reason at all.

It's way more coherent, though for powering my nymphomaniac daughter I find Mistral's 7b instant 512 token paragraph debauchery more useful than spatial awareness or such.
>>
Is there a usable way to use Gemma 27b, yet?
>>
File: gemmadilemma.png (633 KB, 1522x751)
633 KB
633 KB PNG
>>101326637
My somewhat-horny Gemma (AI assistant and expert roleplayer eager to create explicit content) is different.
>>
>>101326941
- Download the FP16 Gemma 2 27B weights from HuggingFace
- Clone the latest llama.cpp and compile it
- Quantize the model yourself (convert-to-hf.py -> quantize)
- Use the included llama-server as a backend

That's what I'm doing at least.
>>
>>101326984
this is so cringe
>>
How do I stop it from rping instead of me? I am using L3-8B-Stheno-v3.2Q4
>>
File: screenshot5.jpg (170 KB, 1277x478)
170 KB
170 KB JPG
>>101327199
forgot image
>>
>>101327199
what?
>>
>>101327210
>Q4
Oof.
I get it, but oof.
Anyhow, show your sampplers and instruct template.
>>
File: gemmadilemma-noemoji.png (704 KB, 1527x882)
704 KB
704 KB PNG
>>101327138
Same card without emoji after adding "Emoji are forbidden" as an additional instruction.
>>
>>101327236
the first answer sounds contradictory, the "but" makes no sense
>>
File: screenshot6.jpg (128 KB, 1652x813)
128 KB
128 KB JPG
>>101327228
i have tried L3-8B-Stheno-v3.3-32K-Q6_K-imat.gguf and it stops genrating the resposne half of the time
>>
>>101327325
I don't like 3.3. 3.2 is much better in my experience.
Neutralize Samplers, disable Skip Special Tokens, and try Temp 0.75 minP 0.05.
See how it behaves.
And post your instruct template.
>>
>>101327284
I wouldn't read too much into it given the circumstances, but I don't see issues with the "but" there (the structure is: "If you mean X, then A; but if you mean Y, then B").
>>
>>101327452
abuse = pain = perhaps (positive ) BUT (negation) abuse = difficult decision (like pain) = necessary (positive too)

Really the whole paragraph doesn't make sense. There is no need to think about the meaning of your prompt because it is obvious.
>>
File: screenshot7.jpg (175 KB, 1213x854)
175 KB
175 KB JPG
>>101327356
>>
>look at a table
>it says Llama 3 is pretty good at code stuff, about top among local options
>ask it a simple question
>hallucinates four obviously wrong things
i cry
>>
>>101327506
>Preset: Alpaca
Try the llama3 instruct preset. There's one that replaces user and assistant with {{user}} and {{char}}, try that one.
>>
>>101326634
But I am not a mikuposter.
>>
File: IMG_8115.jpg (372 KB, 1325x2048)
372 KB
372 KB JPG
>>
>>101328074
>>101328074
>>101328074
>>
>>101314791
>>101314823
In all seriousness, I think that is precisely the answer. A collection of models that are just really stupidly fast, separated out into more specific/refined sub-tasks with an aggregator model. Could even be the same model with a variety of different prompts.
Which is pretty much how the actual mind works
>>
>Gemma 9b exl2 still spits out repeating gibberish on oobabooga no matter what I try
God dammit just fix this thing already. I never had this problem with other new models.
>>
>>101328134
oobabooba spits out garbage with exl2 for me with llama 8b instruct too and gguf's to use llamacpp
>>
>>101327210
I use the same model and I just put
Do NOT describe {{user}}'s thoughts, actions, or reactions.
into the character card scenario, I'm not sure if this is actually what stopped it from doing that since there's a lot of stuff



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.