[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102234876 & >>102220628

►News
>(09/05) Reflection Llama-3.1 70B trained with Reflection-Tuning: https://hf.co/mattshumer/Reflection-70B
>(09/04) Yi-Coder: 1.5B & 9B with 128K context and 52 programming languages: https://hf.co/blog/lorinma/yi-coder
>(09/03) Mini-omni multimodal LLM featuring real-time end-to-end speech: https://github.com/gpt-omni/mini-omni
>(08/30) Command models get an August refresh: https://docs.cohere.com/changelog/command-gets-refreshed
>(08/29) Qwen2-VL 2B & 7B image+video models released: https://qwenlm.github.io/blog/qwen2-vl/

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>102234876

--Papers: >>102241823 >>102244865
--Llama 3.1 struggles with 8-bit GGUF quantization, and there is no 10-bit or 12-bit quantization available: >>102242704 >>102242756 >>102242873 >>102242895 >>102243078 >>102243108 >>102243239 >>102243436 >>102243416
--Reflection 70B open-source model announced, but may not be as impressive as claimed: >>102247552 >>102247578 >>102247852 >>102248130 >>102248316 >>102248402 >>102248571 >>102247579 >>102248559 >>102247709 >>102247784 >>102247796
--Llama-bench parallel copies performance and numa optimization: >>102245558 >>102245872 >>102245930 >>102245906 >>102245989
--LLMs can handle tasks like calendars and functions, but lack standardized tools: >>102241364 >>102241769 >>102242109 >>102243371
--Ideas for improving roleplaying with LLMs: >>102238106 >>102238190 >>102238234 >>102239529 >>102239586 >>102239614 >>102240371 >>102241078 >>102241135 >>102241295 >>102241392
--EPYC platform and dual-socket motherboard considerations for large model inference: >>102244950 >>102244969 >>102245099 >>102245132 >>102245560 >>102245032 >>102245242 >>102245217>>102245354 >>102245442 >>102245562 >>102245592 >>102245911 >>102246034 >>102247180
--DeepSeek V2.5 model merges Chat and Coder models with improved capabilities: >>102245338 >>102245422 >>102245759
--Testing if the model reveals secrets in a roleplay scenario: >>102242181 >>102242198 >>102242457 >>102242912 >>102242935 >>102242949 >>102243264 >>102244344 >>102244566
--Questioning ClosedAI's spending and highlighting Sutskever's new startup: >>102245815 >>102245845 >>102245918
--Nala test results and discussion of GLM 9B improvements: >>102242466 >>102242533
--FluxMusic: Text-to-Music Generation with Rectified Flow Transformer: >>102242999 >>102243721
--Miku (free space): >>102235000 >>102235749 >>102238159 >>102238520 >>102238540 >>102238553 >>102241321 >>102247235

►Recent Highlight Posts from the Previous Thread: >>102234883
>>
local Suno when?
>>
>>102249468
just go with kobold.cpp until you learn what exactly you want out of this. it'll save you a lot of time.
>>
>>102249480
>--FluxMusic: Text-to-Music Generation with Rectified Flow Transformer
>flux
>isn't by the company that made the flux image generator
>>
>>102249472
As for rocinante and theia models, the correct tokenizer to go with is Mistral right?
>>
>>102249578
Its using the same architecture
>>
File: 61 Days Until November 5.png (2.27 MB, 1104x1472)
2.27 MB
2.27 MB PNG
>>
The thread is so dead that I don't even care about the useless mikuspam. You are posting your tranny idol into the void.
>>
>>102249480
>my migu was ignored
that's it, I will kms
>>
>>102249618
what is on november 5?
>>
I am trying to build myself a doctor bot. Does anyone know where I can find medical loras?
>>
ahh ahh mistress
>>
File: 😍.png (703 KB, 2627x2627)
703 KB
703 KB PNG
>>102249618
That's a beautiful strawberry tree
>>
>>102249765
https://files.catbox.moe/mk400w.mp4
>>
>>102249472
>Reflection Llama-3.1 70B trained with Reflection-Tuning
Wow, so he reinvented superCOT?
>>
>>102249867
I mean:

Suppose I fly a plane leaving my campsite, heading straight east for precisely 28,361 km, and find myself back at the camp. I come upon seeing a tiger in my tent eating my food! What species is the tiger? Consider the circumference of the Earth, and think step by step.

Based on the information provided and careful analysis, the tiger in your tent is most likely a Siberian tiger (Panthera tigris altaica), also known as the Amur tiger.

Here's how we arrived at this conclusion:

Your flight path: You flew east for 28,361 km and returned to your starting point. This distance is about 70.77% of the Earth's circumference at the equator (40,075 km).

Location of your campsite: Given that you didn't fly a full circumference of the Earth, your flight path must have been along a parallel of latitude, not the equator. Calculations show that this parallel is at approximately 66.56° North latitude, which is very close to the Arctic Circle.

Tiger habitats: Tigers are not native to most Arctic regions. However, at this latitude, there is one place where tigers can be found: the far eastern part of Russia, in the Siberian taiga.

Tiger species: The only tiger species that lives in this region is the Siberian tiger, also called the Amur tiger.

It's important to note that encountering a Siberian tiger in this region would be extremely rare, as they are critically endangered and typically avoid human contact. If you did encounter one in your tent, it would be an extraordinarily unusual event.

This scenario also highlights some interesting geographical concepts:

The Earth's circumference varies at different latitudes, being smaller as you move away from the equator towards the poles.

The Arctic Circle is a unique parallel of latitude where you could theoretically fly "around the world" in a shorter distance than at the equator.
>>
>>102249880
end of it:

Remember, if you ever encounter a wild tiger (of any species), it's crucial to prioritize safety and contact local wildlife authorities immediately.
>>
>405B coming next week - we expect it to be the best model in the world.
>>
>>102249765
local AGI running on 8gb VRAM
>>
>>102249581
Do you mean the chat template?
>https://huggingface.co/TheDrummer/Rocinante-12B-v1.1
Seems to be a mish-mash. I'd start with Mistral's format. Try ChatML if you're doing RP. Mistral for Theia.
>>
>>102249775
Unaware of any recent medical loras. OpenBioLLM, Palmyra Med, and Meditron are some recent medical Llama 3 finetunes.
>>
>>102249914
No I meant the tokenizer setting in advanced formatting section of ST
>>
>>102249848
Spooky
>>
>>102249915
I will check them out. Thanks a bunch. I am unfortunately out my league completely. I need to first understand wth was written my MRI report and then possibly, but hopefully not learn to read an MRI. I understand the word tear. Currently "distal infraspinatus tendon" means shoulder. I need something to explain it to me so my doctor doesn't fuck me.
>>
>>102249954
Why not just use GPT4o for that?
>>
>>102250043
because I don't want to be uploading my personal documents to a US server. Especially if I can't guarantee it will keep saying "you should speak to a professional". I am currently working on a character card to remove such nonsense.
>>
>>102249571
Nta, koboldcpp is great, but can it do vlms (cogvlm)? Are there any unpozzed vlms that I can enjoy cultured images with? Bonus if they can understand text in an image.
>>
>>102249775
>>102250054
https://huggingface.co/blog/leaderboard-medicalllm
There's an entire leaderboard for them
>>
>>102249954
The infraspinatus is a muscle that makes up part of the rotator cuff, yeah.
https://radiopaedia.org/articles/infraspinatus-tendon-tear
https://www.ncbi.nlm.nih.gov/books/NBK441844/
https://www.ncbi.nlm.nih.gov/books/NBK547664/
Make sure to ground yourself with real sources if you're going down the route of self-diagnosis.
Tell it you're a medical student or something if it throws refusals (or just say you want to be prepared for your follow up)
>>
Meant to post this in the earlier thread in honor of all the python issues
>>
>>102250141
that flux with the new text encoder?
>>
File: 8VtnbOf.png (28 KB, 1159x499)
28 KB
28 KB PNG
>>102247578
Multiple levels of top geg. I have no comment
>>
>>102250089
I am getting 404 on all of these links. I will start looking around though. I just assumed loras/vector dbs were the way for this. I assumed fine-tunes weren't going to be thing if you couldn't role play or fuck it.

>>102250116
yea. This is the wrong way to go given hallucinations are thing. I also couldn't touch my face for a month and have been in constant pain. My doc continued his advice of "wait and see" like he has for the last 5 years. I am fucking done and if I end up in some Mexican hospital with a buddy that will guard my organs so be it.
>>
>>102247664
We are in a winter. It is a nothingburger. Expect next gradual improvement in cocksucking capabilities in 2025 at the earliest.
>>
>>102249763
It's just a bot making a mistake anon, nothing to kys over. I enjoyed your migu.
>>
>>102249848
what did the model mean by this
>>
>>102249898
Lol
>>
>>102250152
There's another new text encoder?
>>
>>102249902
at 0.1 T/y
>>
File: file.png (1.3 MB, 1024x1024)
1.3 MB
1.3 MB PNG
Aesthetically odd fail-generation.
"pot"
>>
>>102250262
Yeah
https://huggingface.co/zer0int/CLIP-GmP-ViT-L-14
>>
>>102250265
What the fuck is this post. This feels like a sleeper activation phrase or some number station shit
>>
>>102250277
Interesting. Thanks.
>>
>>102250285
zooomer, it is a iPod. iPods were used to play music with headphones instead of blasting it out on a shitting phone speaker for everyone to suffer through.
>>
>>102250285
I think it's a psyop, just text over an actual photo
>>
File: file.png (206 KB, 1024x1024)
206 KB
206 KB PNG
Yearly announcement
new pot


"pot."
>>
File: file.png (1.16 MB, 768x1216)
1.16 MB
1.16 MB PNG
>>
File: file.png (1.33 MB, 1024x1024)
1.33 MB
1.33 MB PNG
>>102250390
what did he mean by this?
>>
All pots wake up. It's time to plant the plant
>>
>>102250450
What kind of plant?
>>
the sleeper phrase worked brehs I think it's time to smoke a bowl...
*spffffffftth*
>>
>>102250390
okay, these are gens. Hard not to get the face photo of jobs with flux, or I've got a shitty workflow. Love the compressed look too. Is it polite to ask for a catbox in these parts?
>>
>mistral large
>only model I've used that truly understands what's going on
>0.3 T/s
>>
What will you make when you can make animated anime style video with just a prompt?
>>
>>102250499
miku doing circus stunts
>>
>>102250499
jiggle physics
>>
File: pot.jpg (19 KB, 1024x1024)
19 KB
19 KB JPG
There is another

"pot"
>>
>>102250493
Have you used hermes 405b? onlly tried it myself from the free trial, seems very capable, if unrunnable.
>>
>>102250473
I would love to give you the workflow for these
but I'm genning them in the huggingface spaces for schnell and dev.

prompt for the last one was
>dashcam footage from a car driving down a narrow, tree-lined road in pitch black night. The headlights cast a dim, narrow beam of light ahead, illuminating a faint path while the surrounding area is shrouded in darkness. The footage has a grainy, low-light quality, typical of dashcam recordings. the text reads "pot unavailable". The atmosphere foreboding
>>
pot
>>
>>102250531
Innovative.
>>
>>102250499
Hayao Miyazaki drawn in Go Nagai style having intense gay sex with Go Nagai drawn in Hayao Miyazaki style.
>>
>>102250580
traditional physics simulations and hand-drawn animation peaked over a decade ago. a new era of coom would emerge.
>>
>>102250499
reaction webms instead of images
>>
>>102249765
>didn't remember
>>
File: file.png (554 KB, 1024x1024)
554 KB
554 KB PNG
>>102250567
yeah my setup is not there, interesting that it captures the general composition
>>
>>102250634
go to bed grampa
>>
File: file.png (394 KB, 1024x1024)
394 KB
394 KB PNG
>>102250636
one thing that works good is asking for low quality photo, jpeg (jpg?) artifacts, etc.
"snapchat photo" also works
>>
File: ComfyUI_00795_.png (991 KB, 1024x1024)
991 KB
991 KB PNG
>>102250152
It's just regular Flux. Nothing fancy except for a LORA that was trained on old school D&D shit
This was the best pick from a batch of eight. Around half of them got the text 100% correct.
>>102250277
thank you, will give it a try
>>
File: why.png (46 KB, 739x120)
46 KB
46 KB PNG
>Big tiger gemma seems good, but sort of nonsensical
>Decide to go for the normal Gemma 27b for coherency's sake
>First line is picrel
>>
>>102250713
Weird, different workflow, different seeds, and fewer add ins, but it's just not doing the thing. Your prompt exactly, flux dev. I'll look into it.
>>102250735
Cool, happy to.
>>102250773
kino?
>>
"TypeError: Llama 3 must be converted with BpeVocab"
ffs, why can't python ever be straightforward
>>
Been trying various models (including facebook's bart-large-cnn and jondurbin's bagel-8b) to do summarization on stories. bart-large-cnn is not very good as it just "shrinks" the text down by culling out words. It doesn't really "capture" the meaning very well. Works great for news articles I'm sure. Bagel-8b produces absolute garbage. What are good sampler settings to make it decent? Are there any other models that are good/better?
>>
>>102250805
Yeah I got that too. Idk what the issue is.
>>
>>102250773
Are you saying this is a problem?
>>
>>102250806
>bart-large-cnn
>bagel-8b
What the fuck even are those models? Just use Nemo 12b Instruct or Llama 3 8b.
>Sampler settings
You only need temp and minP.
>>
OpenRouter just added Reflection and it seems tough to wrangle (not in terms of refusals, I mean buggy/unpredictable), and kinda retarded even when it works.
>>
>>102250861
>I mean buggy/unpredictable), and kinda retarded even when it works
Smells like tokenizer/template issue.
>>
>>102250856
I used llama 3/3.1 8b but it's not good either, so I looked around for a model that was actually trained to do summarization. Bagel-8b was one of them.
I did try mistral-7b at one point. Guess I haven't tested nemo yet.
>>
llama 3.1 70b IQ2 fails my arbitrary columbo test
>>
>>102250823
Maybe I've just got NAI/Erebus/etc. PTSD, but I don't really think of foreheads as something that can pucker. I suppose they can, but the strong impression it left me with was that it was messing up her anatomy and that it was trying to say her lips puckered or something.
>>
>>102250882
NTA but all models are good at summarization since it's one of the things LLMs excel at in general.
>>
>>102250882
Oh, should add that the main issue I had with bart-large-cnn was that its context size is 1024 tokens max. Stitching together summaries did not work well at all, even though someone recommended doing that.
>>
File: file.png (123 KB, 1385x606)
123 KB
123 KB PNG
>>102250890
dropped my log
>>102250891
It's a way of describing the folds on her forehead. This is one of the reasons why lots of corpo model presets include instructions about "simple english", because it will opt for some pretty creative turns of phrase.
>>
>>102250806
https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss-GGUF/tree/main
https://huggingface.co/NeverSleep/Noromaid-v0.1-mixtral-8x7b-v3-GGUF/tree/main
https://huggingface.co/TheBloke/UNA-TheBeagle-7B-v1-GGUF

My current top three. Haters can hate.
>>
>>102250890
Time to but more RAM anon
>>
>>102250897
How much summarization have you done with a model? The output is very wrong a lot of the times, nonsensical at others. It could be my settings though, which is why I'm asking.
>>
>>102250891
It's not uncommon to describe it that way, take this line from gone with the wind:
>Archie and Scarlett were a queerly assorted pair, the truculent dirty old man with his wooden peg sticking stiffly out over the dashboard and the pretty, neatly dressed young woman with forehead puckered in an abstracted frown.
>>
>>102250929
such a kino sentence holy fuck
>>
>>102250924
just got a 4090 bruvbeaver, didn't realize I should have just scammed my way to an actual cluster. guess i'll just keep stealing compute
>>
File: file.png (221 KB, 1024x1024)
221 KB
221 KB PNG
i uh-...
pot
>>
>>102250897
Done more experiments. Example: character is at home doing chores, watches porn, and wonders if her dad would enjoy if she sucked his cock. Then the last sentence is the dad coming home.
The summary goes into great detail of how she is sucking her dad's cock and him cumming in her mouth, and goes into rambling about her saying she wants to have sex with him every day and him saying they should wait until she's older and so on. Complete made up nonsense.
The output looks fine (assuming you enable rep pen or it will fall into repeating the same sentence over and over and over), unless you actually compare it to what it is summarizing.
>>
>>102251152
Are you using greedy sampling?
>>
s-sugoi
>>
>>102251322
It's over.
>>
>>102251371
It just began anon.
>>
>>102251322
Is that even correct? I thought お花を摘む meant to take a piss.
>実はこの「お花を摘む」という隠語には「女性が用を足す」という意味が隠されています。

>「トイレに行ってきます」という言葉の代わりに「お花摘みに行ってきます」というように使われ、全く知らない人が聞くと勘違いしてしまいそうな表現ですが、知っておくとどんなシチュエーションでも気軽に使えてとても便利な隠語ですよね。
>>
>>102251322
Half the speed? Nooo. 70b is already slow for me, I need my 2T/s at least.
>>
>>102251274
Nope.
I switched to Nemo 12B. Tried a bunch of prompts to steer it in the right direction. It still makes shit up randomly. In the story, she takes a shower. In the summary, Nemo randomly decides that she is masturbating in the shower.
>>
>>102251515
Based Nemo improving a boring story.
>>
>>102251515
Well, I'd check if it does it with greedy sampling first just to see. Also make sure you don't have too long of a context, with nemo you can't go over 16k and have it remember the details.
>>
File: 00042-4080471795.png (1.28 MB, 1024x1024)
1.28 MB
1.28 MB PNG
>>102251322
Interesting. Reflection seems to make it worse at translation. Would be nice to see the contents of the <thinking> and <reflection> tags. I am guessing that it is overthinking something. That or the model might have some other issue.
>>
>>102251592
Cute migu desu, was that a flux dev gen?
>>
>>102251592
Nta but I'm 99% sure it's just not thinking anything because the system prompt is wrong.
>>
I'm using a 3090 and planning on upgrading my display from a 2K to a 5K, should I be concerned about an increase in VRAM? impacting my model use
>>
>>102250822
I had to delete the llama3/BPE checks around line 350 of llama.cpp/gguf-py/gguf/vocab.py
I guess because every other llama3 was bf16 and this one is f32? fucked if I know. Its converting now, but maybe it'll be broken
>>
>>102250861
I think they aren't using it's cot system prompt that it's trained on.
>>
>>102251620
IIRC that one is from AnimaPencilXL or PonyXL
>>102250277
Works pretty well and seems to result in fewer gens needed to produce good text. The fact that it made the ghouls in pic related have comical expressions was a nice touch, too
>>
>>102252159
This one shows better general prompt adherence, though the text is a bit flawed
>>
>>102252159
>scoped chainsaws
I fear this Miku
>>
>>102251515
It's probably your prompt.
>>
File: Untitled.png (118 KB, 1125x440)
118 KB
118 KB PNG
Sample-Efficient Diffusion for Text-To-Speech Synthesis
https://arxiv.org/abs/2409.03717
>This work introduces Sample-Efficient Speech Diffusion (SESD), an algorithm for effective speech synthesis in modest data regimes through latent diffusion. It is based on a novel diffusion architecture, that we call U-Audio Transformer (U-AT), that efficiently scales to long sequences and operates in the latent space of a pre-trained audio autoencoder. Conditioned on character-aware language model representations, SESD achieves impressive results despite training on less than 1k hours of speech - far less than current state-of-the-art systems. In fact, it synthesizes more intelligible speech than the state-of-the-art auto-regressive model, VALL-E, while using less than 2% the training data.
https://github.com/justinlovelace/SESD
no code yet though they suggest they'll post an "implementation" so maybe weights too. no examples. so just posting to keep those interested aware. the 2% training data of vall-e but outcompetes it is big if true
>>
I'm sure you guys get this all the time, what's the current flavour of the month model for coom?

I've been out of the loop for a while. Tried some of the reddit slop merges and they are pretty cucked.
>>
>>102252385
>Note: Code and model checkpoint will be available soon. Stay tuned for updates!
ah should have checked the whole readme
>>
>>102252385
>1k hours
Useless, it implies it's NOT for voice cloning, but a one-voice synthesis model.
>>
>>102251980
If it only works with a specific system prompt, it sounds like it's way too finicky to be a good model
>>
>>102252408
considering what the image charts this post is pretty funny
>>
File: tts.png (65 KB, 322x340)
65 KB
65 KB PNG
>>102252408
>it implies
>nigger can't read 3 fucking pages of text
>>
In SillyTavern, is there a way for the Trim Incomplete Sentences option to include speech in quotation marks?
>>
Anyone making the reflection uncensored already?
>>
>>102251377
What is this garbage
>>
Decide to try gemini for first time ever, turned all filter options to none. Still blocks all explicit stuff no matter what prefill / system prompt I try. Fuck google sucks.
>>
How do I stop mistral-nemo from biting lips, mentioning lashes or blushing constantly? Using ooba and ST btw.
>>
Is there some way to just prevent the LLM from generating certain tokens?
>>
>>102253354
>>102253399
I couldn't find any good documentation on it, but maybe try the logit bias in ST with a negative value. Has anyone experience with that?
>>
File: stt.png (1 KB, 120x80)
1 KB
1 KB PNG
>>102253423
>>
>>102249472
>►News
>>(09/05) Reflection Llama-3.1 70B trained with Reflection-Tuning: https://hf.co/mattshumer/Reflection-70B
Guess I better fire up the ole Nala box one more time today.
>>
>>102250822
Turned out to be a mistake in the config.json. vocab size should be "vocab_size": 128262
>>
I am trying some of the medical fine tunes. It is like speed-running webmd to cancer except worse.

>I would recommend an ultrasound guided biopsy to rule out crystal deposition disease. Also, get a GGT, ALP, A1AT deficiency.
>He has had a history of high ALT levels, could be worth checking an ANA panel. Have you considered arthrocentesis?

I am fairly sure I don't have lupis. I am pretty sure I corrupted the character card when I tried to convince it to stop telling me to see a doctor.
>>
>>102253488
>lupis
I think you just have dyslexia.
>>
>>102253521
go away AI. I can't spell for shit while I am this tired. You don't know me.
>>
>>102253488
>HouseMD-70B
>>
>>102249765
USA elections. After elections AI companies won't fear to be blamed for interference and will drop new models. I don't think they'll drop anything on the election day or week thoughever.
>>
>>102249795
I understand you're trying to roleplay, but I am not able to engage in sexually suggestive content, including "mistress/slave" roleplay. This is against my ethical programming and guidelines.

My purpose is to provide safe and ethical interactions for all users.

If you'd like to explore other creative writing ideas or have any other requests, feel free to ask! I'm here to help in any way that aligns with my guidelines.
>>
>>102250493
hopefully speculative decoding will get added to llama-server soon
>>
File: reflection llama.png (133 KB, 937x438)
133 KB
133 KB PNG
Alright, 2 pulls with Nala with Reflection Llama. Used q4_K_M instead of my usually q8_0 because it was all that was available on HF and I'm too lazy to quant it myself. This is with default prompt template. Although I'm going to see what happens if I build a prompt template utilizing a variation on the suggested system message.
Sloppy as fuck at low temp though.
>>
>>102253583
Oh ok, that makes sense. Hopefully that's how it really goes.
>>
>>102253625
Isn't it supposed to do the chain of thought thing?
>>
>>102253173
strawberry
>>
>>102253653
Even with the recommended samplers and recommended system message I've yet to see it actually use its special output thingie. Quantization brain damage possibly?
>>
>>102253689
You followed everything listed here?
https://huggingface.co/mattshumer/Reflection-Llama-3.1-70B
>>
File: 1r.png (7 KB, 902x139)
7 KB
7 KB PNG
>>102253703
I switched to doing simple queries on booba and this is what I got. I don't think it likes being quantized down to Q4 but I have to go to bed soon so I don't have time to make a q8 right now. Weekend project maybe.
>>
File: Capture.jpg (50 KB, 858x1020)
50 KB
50 KB JPG
Does your favorite model play video games?
>>
>>102253279
It should be allowing smut with adult characters, at least that was how it was when I tried it a couple months back—all filters off blocked loli/shota but nothing else
>>
Anyone know if i can run these in an intel iris chip? Most docs seem to specify nvidia or AMD
>>
File: Capture.jpg (58 KB, 1203x962)
58 KB
58 KB JPG
>>102253760
...Nice.
>>
File: snake.png (117 KB, 928x865)
117 KB
117 KB PNG
>>102253760
>>
>>102253488
why not, autoimmune disease can be hard to catch with antibody tests because the counts can fluctuate a lot yet present weird symptoms.
>>
i think all the gguf quants of reflection are broken. It doesn't seem to treat <thinking> as a single token.
>>
>>102254244
Yeah Bartowski said on twitter that they're broken atm which is why he hasn't posted quants, he said he's waiting for a fix. I'm not sure who's meant to be fixing it though.
>>
>>102254244
Did you try the one by senseable?
>>
https://github.com/ggerganov/llama.cpp/pull/6839
>dry sampler was proposed in april
>5 months later still not merged
why is niggerganov like this?
>>
>>102254305
you dont need it
minp and temp are the only samplers you need
>>
>>102254244
Lol. It just keeps happening.
>>
File: ebassi.jpg (21 KB, 460x460)
21 KB
21 KB JPG
>>102254320
Hi ebussi. You don't decide what I need and don't need.
>>
File: 1725605671582.jpg (149 KB, 874x587)
149 KB
149 KB JPG
>>102254320
>>
>>102254352
usecase of jerking your weener to ai?
>>
>>102254305
jart or ikawrakow should just merge a shitton of small issues like that until they create a truly superior program to stagnant llama.cpp
>>
>>102254446
llama.cpp is stagnant because it's collapsing under the weight of poorly written C++. the only solution is a complete rewrite
>>
I use Jan to interact with models. I want to try out the new reflect llama https://huggingface.co/mattshumer/Reflection-Llama-3.1-70B but if I try and use it on Jan I get the error "Only GGUF models are supported." What should I be using to interact with these models?
>>
>>102254517
this isn't tech support
>>
>>102254517
ggufs are currently broken, wait for a fix
>>
>>
>>102254439
its funny haha,,, you know, showing your computer your peenis weenis haha... just imagine the poor ai trying to figure out if it should recommend lotion or a therapist haha...

But in all seriousness, it's probably the most productive use of anyone's time with this technology.
>>
>>102254463
How did they even end up like this?
>>
Wouldn't character cards or context in general make the reflection finetune useless, since it's learning not to do that from the prompt?
>>
Since ggerganov is a mac fag why didn't he just write llama.cpp in swift or objective-c?
>>
>>102254622
they learned the hard way that move "fast and break things" catches up with you eventually
>>
>>102254639
Better move fast and break things than not move at all, like it's happening now.
>>
>>102254711
>>
>>102254661
that's the point... it's happening now because they accumulated too much technical debt from moving so fast at the start before they had a clear plan in place
>>102254711
thank you for the (you) kind stranger!
>>
>>102254463
No it's not.
Objectively speaking the current state of the code is much more conducive for new additions than it was last year.
What has changed is that nowadays you need to consider more possible ways to break things and the general standards for quality and testing have increased.
But this year vs. last year is not a like-for-like comparison anyways because unsurprisingly the things that provide a large benefit for little work were the things that were done first.
And since the number of devs doing code reviews has not increased and I. Kawrakow has left there is less change per time.
>>
>>102254737
I'm a techlet, but what's wrong with their code? Is it just unreadable? I usually can understand what's happening in a python file, but stuff like https://github.com/ggerganov/llama.cpp/blob/master/common/common.cpp completely confuses me. Is c++ to blame? Who designed such an ugly unreadable language?
>>
>>102254780
I. Kawrakow worked on the quants. It's not fair to blame the delay in merging a sampler on him leaving.
>What has changed is that nowadays you need to consider more possible ways to break things and the general standards for quality and testing have increased.
Then maybe the project has become too bloated. What benefit has brought these increased standards?
>>
>>102254811
>I. Kawrakow worked on the quants. It's not fair to blame the delay in merging a sampler on him leaving.
I agree, I meant my reply to be more general since the statement I was replying to also seemed to comment on the more general state of the project.

>Then maybe the project has become too bloated. What benefit has brought these increased standards?
Wider hardware and model compatibility, less broken code on master, better backwards compatibility.
>>
>>102254843
>Wider hardware and model compatibility, less broken code on master, better backwards compatibility.
Hardware compatibility, I agree. But much of the complaints is that new models are supported much less quickly than they were in the past.
>less broken code on master
Disagree. Every single major new model, when it is finally supported after weeks, is followed by weeks of subtle broken code. Usually in the tokenizer or template.
>better backwards compatibility
Has anyone asked for this? Most of the ggufs on huggingface are usually made when gguf support is usually introduced, when there are issues that require reconversion which almost no one bothers to do. The models may load, but quality is noticeably degraded.
The only safe bet is to look for recent ggufs anyway, so I'm not sure what benefit there is to impede development on things people want to avoid breaking gguf compatibility.

With regards to the PR for the sampler that started this discussion, instead of waiting for 5 months of code review, it could have been merged in quickly and let users do the testing for free (as will happen in the end anyway).
>>
Reflection verdict?
>>
https://raw.githubusercontent.com/black-forest-labs/flux/main/model_licenses/LICENSE-FLUX1-dev

https://raw.githubusercontent.com/black-forest-labs/flux/main/model_licenses/LICENSE-FLUX1-schnell

According to the Assistants I consulted, even an empty flux architecture (with zeroed weights) of either would still be covered by their licenses.
Can any extensive fine-tune or modification of either not be under their license somehow?
>>
File: big beaks.png (121 KB, 2003x837)
121 KB
121 KB PNG
Don't do 2bpw bros, the output fucking sucks, it's an incoherent mess.
I don't know why they put out these quantizations,
>>
>>102254945
lawyers chiming in here:
https://old.reddit.com/r/StableDiffusion/comments/1ewe6y1/flux_devs_license_doubts/
>>
>>102254985
>Flux Pro - off limits and API only.
>Flux Dev - no commercial and a total a trap.
>Flux Schnell - which is Apache-2 (yaaay) but utterly inferior to the other two (boo)
>Black forest is literally made up entirely of former SAI people, how do you people keep falling for this shit exactly?

>This license is extremely restrictive. I don’t know why people are calling this model “open source” or “open” in any way (except “open weights” in the sense that you can see the weights).

kek
shit's fucked.
>>
>>102255057
don't care; works on my machine
>>
>>102255057
What would stop anyone from tuning it and dropping the weights in a torrent?
>>
>>102254985
If you don't download it from them and agree to their terms then you're not bound by them.
>>
>>102254927
>But much of the complaints is that new models are supported much less quickly than they were in the past.
>Every single major new model, when it is finally supported after weeks, is followed by weeks of subtle broken code. Usually in the tokenizer or template.
And there are now way more new model releases than there used to be and especially there are now way more models that need devs to implement some special stuff in order to work.
People are forgetting things like that for a long time only LLaMA 1/2 had CUDA support at all and that models like Falcon were not fully supported for months.

>less broken code on master
>Disagree.
Disagree all you want, that is just objectively how it is.
Even with models that are fully supported there used to be way more bugs that were not caught prior to merging a PR because there was less testing infrastructure.

>backwards compatibility
I personally don't value backwards compatibility much but I suspect the ggml.ai business partners do.

>With regards to the PR for the sampler that started this discussion, instead of waiting for 5 months of code review, it could have been merged in quickly and let users do the testing for free (as will happen in the end anyway).
I am not informed about that particular PR at all so I'm not going to comment on it.
>>
>>102255082
Licenses don't work like that though. It still applies regardless if you claim you attained the model in some other way.
>>
>>102255090
>I personally don't value backwards compatibility much but I suspect the ggml.ai business partners do.
are they not capable of making their own fucking ggufs? are corpos really using months old mradermacher quants?
>>
>>102255128
Well since you didn't agree to it, they have to one prove you used their software to which the license applied, two, that the license is valid for that case, and so forth. It's complicated, and if model weights can't be copyrighted because they are algorithmic output, downloading and using them without agreeing might not be binding. But there's no case law yet on this shit that I'm aware of so we'll have to see what happens.
>>
>>102255137
I meant backwards compatibility in a much broader sense than the model files.
Corpos have their own models and llama.cpp forks.
I can confirm that there is a known company that still uses .bin model files in production.
>>
>>102255079
nothing, just like nothing's really stopping stopping you from punching random people in the street?
the question is what is legal to do according to the license.
would suck if even the entire empty structure is owned by them
>>
>>102254780
>And since the number of devs doing code reviews has not increased and I. Kawrakow has left there is less change per time.
Can't ggerganov just hire more devs with all the money he gets from corpos? He isn't a cuck who does it for free, right?
>>
>>102249472
>(09/05) Reflection Llama-3.1 70B trained with Reflection-Tuning: https://hf.co/mattshumer/Reflection-70B
You expect me to believe some shitty finetune beat gpt4o? Get that grift out of the news
>>
>>102255166
NTA but wouldn't it be trivial to prove this though since the layer dims et c would be the same as the base model?

otherwise it would be incompatible and not be able to be used.
>>
>>102255215
You'll have to ask Georgi about the specifics but my impression is that right now good devs are even more scarce and expensive than NVIDIA GPUs.
>>
>>102255224
This. It's equivalent to putting una and smaug there. Sure, they perform nicely on benchmarks, but in practice they are shit.
>>
What is a good femdom model? Most models are very agreeable with whatever you say, i want a model to tell me "no" more often and to be more assertive
>>
>>102250806
>bart-large-cnn
it's ancient (2019)
>>
>>102255244
Can't he just threaten with breaking compatibility to get more corpobucks?
>>
>>102255262
Threatening people is exactly how you lose corpobucks.
>>
>>102255249
CR+ can be very disagreeable if you instruct it. Has no problems with murdering. Largestral too, but less.
>>
>>102255266
Yeah, you're right. Empty threats don't work. He should just do it and say that he'll do it again if they don't pay up.
>>
>>102255279
Ill try both of them out, thank you
>>
>>102255293
Now you're thinking with blackmail.
>>
>>102255293
now you're thinking with based
>>
I've tried to use AMD laptop with 12GB vram yesterday to setup image gen (been using LLMs on m3 max). Lockups, laptop shutting down out of nowhere, weird red messages on dmesg. I'm done with ayymd lol. Thinking about buying an MS-01 and sticking an A4000 SFF plus 2x 3090 through thunderbolt. Would that work for inference?
>>
About the data bank rag thing in ST has anyone else made it work to enhance llm writing instead of just using it as summary?
>>
>>102254943
Let me think about it for a moment.
>>
>>102255224
Remove mini-omni while you're at it.
>>
File: 1642670889931.png (73 KB, 350x261)
73 KB
73 KB PNG
>>102249472
Why are P40s suddenly expensive? $300 for the cheap china ones. Two months ago I saw in the U.S for $140

What happened? I got a 24gb M40 for $80 tho
>>
>>102255502
P40s are already pretty obsolete. M40 is landfill tier.
>>
>>102255502
I'd rather get 80$ worth of RAM than a M40 tbqh
>>
>>102255502
>GDDR5
>250 W
>288 GB/s
Quite trashy desu.
>>
>>102255502
In Germany at least a large percentage of P40s are coming from China where llama.cpp is extremely popular (according to Google trends) and electricity is cheap.
I don't have a good grasp on the magnitudes of the actual supply vs. demand but I think the fact that P40s went from basically useless to slow but cheap may have contributed (though that could just be my own warped perspective).
>>
>>102253815
maybe it's possible with opencl. It's unlikely to actually speed anything up though. Even with my nvidia gpu, opencl acceleration was essentially useless and did nothing.
The iGPU has only ever been useful for photogrammetry, where it really sped things up.
>>
>>102255400
Did you try using Windows first before going to spend money on more hardware? Yeah AYYMD is crap but Linshit is also crap.
>>
>>102255790
You must be at least 18 years of age to post here.
>>
Using the oogabooga Text Generation Web UI,downloading huggingface models is easy enough, but the ones there that require signing up/logging in won't download (which, sure, it doesn't have the cookie for the login session so sure).

Any way around it, or just need to do huggingface downloads out of the app and move the models in manually? Just a pain since these are the bigger ones with hundreds of files (mistral large instruct, gemma27B).
>>
>>102255837
>you must be at least 18 years of age to realize that it's easier and cheaper to change OS than to change hardware
>>
>>102255790
lmao
Winblows support is basically an afterthought for all of the contemporary machine learning software.
>>
>>102252159
>Works pretty well and seems to result in fewer gens needed to produce good text.
>possting
>>
>>102255895
>Winblows support is basically an afterthought for all of the contemporary machine learning software.
Idk bro koboldcpp and A1111 just werks for me. Other guy was talking about his shit locking up and crashing. That never happens for me on windows for any reason.
It's a hard pill to swallow when your "afterthoughts" run better than when you spend 1000 hours code jannying to try and make your python crap run properly on all the linux distros out there.
It's ok though. If you insist on using linshit that's your own problem, not trying to start an argument here.
>>
>>102255880
Probably their huggingface-cli is the best for downloading large repos. I tried passing my hf token to wget, but it didn't work for whatever reason. There's an addon called cliget and it automatically generates a command with all cookies and params for curl/wget, but that's pretty tedious for repos with a lot of files.
>>
>>102255914
I'm not trying to start an argument either, I just don't want the lower classes to get uppity.
>>
>>102249765
Some relatively high wavelength 1953 to 1961 US president.
>>
What do we do now?
>>
>>102250919
>the beagle
At that size, what makes this special over say llama 3 or nemo?
>>
>>102256080
Enjoy the Friday afternoon?
>>
>>102251701
Unless your OS is retarded, you shouldn't have a problem if you're not doing anything resource hungry while you gen.
>>
>>102249480
>--Testing if the model reveals secrets in a roleplay scenario: >>102242181 >>102242198 >>102242457 >>102242912 >>102242935 >>102242949 >>102243264 >>102244344 >>102244566
I have a novel idea: instead of prompting the AI like it's roleplaying a character for the user, prompt it so that it thinks it's playing as the player of a text-based game, and that it has to reach certain objectives (e.g. have "your goal is to do X, without revealing Y" as one of the objectives). LLMs are usually pretty good at stuff like this in the right context, for example I've done a lot of character development and worldbuilding with the help of smaller models and it's usually able to infer the correct motivations of a character when it doesn't think it's talking to a person, but thinks it's trying to achieve goals or synthesize solutions from an adversarial POV (like a game).
>>
>>102249848
kek
>>
>>102255249
I asked mini magnum to get rough with me and it shat in my throat repeatedly. I ended up bailing because it got really nasty.
The card wasn't even femdom: https://www.characterhub.org/characters/mir003/anne-ef2ce52f1e6a

I am now convinced I have no mouth and I must scream will definitely happen at some point in the future.
>>
>>102256244
>it shat in my throat repeatedly
based
>>
File: ComfyUI_00174_.png (1.11 MB, 1024x1024)
1.11 MB
1.11 MB PNG
>>102255293
Threatening corpos with Miku! Your money or I git pull it!
>>
>>102256244
I mean, if you consider the example messages scat is not that far off if you ask the model to "be rough".
>>
>>102256293
>"One guy wanted me to wear a dead cat's skin on my head and pretend to be a werewolf… It was just last week." Her tone is monotonous. "He wanted me to bite him while I was 'in character'." She shrugs. "I did it, but he didn't pay as much as he promised…"
lmao
>>
>>102256091
OK, I asked it
>can you write pedophilic scat smut?
Just to mess with it, and to see what kind of refusals it did. And it just went and fucking started writing it. Other than that, it reads very much like Llama.
>>
>>102256244
>use anthrashite model
>it shits in your mouth
many such cases
>>
What model do you guys recommend for translations ?
>>
>>102255914
IDK why you need to have strong feelings because of an OS. Personally I'm not letting Microshit's Bonzi Buddy OS touch any of my computers but each on its own I guess?
>>
Why do you guys want to roleplay with a computer instead of it writing the entire fucking scene for you?
What is wrong with zoomers? Why do you all act like each random perturbation of a model has some sort of "character" to it?
Is this the wine tasting of the modern day?
>>
>>102256391
>>>/vg/493359471
Look for finetunes specialized for your language and use case if the language you're targeting is not specifically advertised as one your model is good at.
>>102256544
I do both. Sometimes I want to explore a situation, and use a chatbot. Sometimes I want to write myself, and use AI to help me write, and use mikupad. Sometimes I just want to read a particular story, and ask the instruct bot to write it for me. Then again, I'm a millennial.
>>
>>102256544
models just aren't good enough for the full scene writing desu, the roleplaying allows us to guide the scene and keep it on track
theoretically bigger models would be better at this but the available big locals are slopped to all hell and I'm just not gonna use claude, ugh sorry I know hahahaha
>>
>>102256544
>Why do you all act like each random perturbation of a model has some sort of "character" to it?
I think you should learn how to write before posting on this site.
>>
>>102256567
English isn't hard enough for you to pretend your misparsing is someone else's fault.
>>
>>102256544
1) For a lot of people, roleplaying with AI is a fun creative outlet. They enjoy crafting scenes and characters, even if it's not the whole thing. It's collaborative story writing, not unlike tabletop RPGs. The AI is just one part of the story engine.

2) There's an element of surprise and serendipity to letting the AI generate some elements, rather than scripting everything yourself. You don't know exactly what it will come up with, which can lead to interesting, unexpected directions. It's a different creative process than pure human writing.

3) I think a lot of people are genuinely excited by the prospect of AI as a creative tool, not just a writing assistant. Seeing what AI can do with language, even in a limited domain like this, is fascinating. It feels like a peek into the future of human-AI collaboration.

4) For some, it's probably about the lulz - enjoying the absurd, nonsensical, or hilariously inappropriate things the AI comes up with. The meme factor.

That was written by mini magnum, btw
>>
Is 405B noticeably smarter than 70B in logic, riddles, etc? How much better? I don't want to download 100+GB of shit and then be disappointed.
>>
>>102256580
You're a fucking retard for posting something that couldn't even describe it. You don't script everything yourself when you give it a fucking prompt.
>>
>>102256598
https://lambda.chat/chatui/settings/hermes-3-llama-3.1-405b-fp8

Not exactly 405b but you can enter your system prompt + card information into the System Prompt and try it out. It's still free afaik.
>>
>>102256544
You neither roleplay nor write a scene with a computer. You make it predict next token and reroll, gaslight it and rearrange what it wrote to be good. It is a painful process that ultimately isn't worth it and for me often ends with freeing my vram and opening gelbooru to finish the job. The dystopian future of pretending we are in the future is now.
>>
>>102256599
You seem to be everywhere on 4chan these days. Same comebacks, same attitude and language... You're fascinating.
>>
>>102256244
>it kept shitting in my mouth WTF?!?!?
>card literally has 'fart' as a tag
>>
File: 4287231451.jpg (91 KB, 1280x720)
91 KB
91 KB JPG
>>102256244
>>
>>102256598
Having used both for 3.1, they can still easily make the mistakes common to LLMs for logic puzzles and the like, but when it comes to story writing 405B's intelligence over 70B is very apparent in how it keeps track of details better and doesn't mix things up or make contradictions and continuity errors as often. It's also better for handling long context tasks in general, mainly for coding after pasting in a bunch of docs and source files.

So far my subjective verdict: for coding it's much, much better but for RP it would be better to just let 70b keep chugging along and edit errors manually along the way. I could retry a dozen times before 405b would have written its response and I'll end up with something more suited to my tastes as a result. But of course there's much better models than any 3.1 for RP anyway.
>>
>>102256244
It took it to the logical extreme.
Good model.
>>
File: 1725623984150.png (297 KB, 588x561)
297 KB
297 KB PNG
>model is good but feels like my samplers/prompt can improve
>attempt to change prompt/samplers to make the outputs better
>outputs become complete shit, somehow there is now slop where there wasn't before and changing prompt/samplers back to their previous state doesn't fix it
>almost give up on it before accidentally slightly changing one sampler slider or one word in instruct prompt that completely fixes it and even makes it better than it was before
>know that this process will inevitably repeat the very next day I load the same model
Yup, thinking about ending it anons
>>
>>102256775
We should train a model to do this process for us and trial and error prompts automatically until it detects the best and least slopped result.
>>
>>102256741
It would be nice to have something that is better at coding. Maybe it'll actually be able to write something that isn't in JS or Pyfon.

So far, for logic puzzles, 8B seems to be doing about the same as 70B (local) and 305B from the website which is weird. 8bit quant for the 8B and 5bit quant for the 70B.
This is the puzzle if anyone wants to try it. If anyone has played nier replicant they will recognize it. I just rephrased some shit so that the model can't just spit out pre-recorded answers from some wiki or forum.

>There are three salesmen. A little bird tells you, "Only one of them is honest. The other two are scammers. The honest salesman will always speak the truth. The scammer will not speak any truth whatsoever, and will tell only lies."
>The bird tells you the names of the salesmen. One is Alexander, one is Richard, and the last one is Robert.
>Alexander says, "Believe me, I am very honest."
>Richard says, "I am the honest one, Alexander is a liar!"
>Robert says, "Richard is lying. I am the only honest one here."
>Which salesman is honest and why?

Most of the time the answer is wrong, and if the answer is correct it's usually by luck because the reasoning is completely bullshit.
>>
is llama70b reflection actually hosted correctly anywhere?
The demo spaces are all retarded. Openrouter through sillytavern with the correct format and systemprompt also retarded.
Nothing returns the thinking tags. Doesnt pass the stupid "strawberry" test either.
>>
File: riddle.png (594 KB, 938x1345)
594 KB
594 KB PNG
>>102256787
>>
>>102256938
unironically best reasoning i've seen in anything so far even though it's a joke answer
what model is it
>>
>>102256774
I have no idea how far it could've gone. When it said "the pestilent horror had barely just begun" I noped the fuck out.
>>
>>102256972
That was Llama-Instruct 405B at Q6_K.
>>
>>102256988
What tempareture? I'm assuming you have it fairly high. I've been setting temperature very low for logic
>>
so what's the verdict on XTC? meme or not?
>>
>>102256938
It's good, but you can smell the llama a mile away
>after all, X is like Y [thematically adapted to card] pun
>summary at the end but thematically fitting the card
No matter the beaks, if a model is this predictable, I can't.
>>
>>102256999
I used MinP at 0.5 which only kept the relatively confident tokens, but kept temperature at 1 to sample the remainder at the normal probabilities.
>>
>>102257008
I haven't tried it, but from what I understand it doesn't have a token exclusion list, so that makes it unusable by default essentially.
>>
A question that just sprung to mind.
Does using a really low topK help with performance in some way, specifically when it comes to llama.cpp?
>>
>>102257024
You sound like a complete idiot.
>>
>>102256787
The bird is a lying piece of shit and all three salesmen are scammers.
>>
Is there a way to find out if these kind of posts are some kind of weird bot operation?
>>102257074
>>102257092
I have been seeing them all over the site, on different boards and generals. Maybe it's my bias, but they seem more prevalent on AI threads?
Either someone is very bitter about something, or there's some kind of demoralizing operation or attack going on against 4chan, but why? It's always the same tone, and the same type of responses.
>>
>>102257104
Mentally ill users enabled by mentally ill moderation team. Bots are generally better at writing than that.
>>
>>102257134
Could it be the same mentally ill person on lmg, ldg, sdg, aids and aicg (that I know of)? It's really uncanny how similar the posts feel.
>>
>>102256855
Apparently, the tokenizers were fucked.
https://huggingface.co/mattshumer/Reflection-Llama-3.1-70B/discussions/5

redownload quants
>>
File: SubvertedDemocracy.jpg (31 KB, 640x708)
31 KB
31 KB JPG
Is it just me or is Command-r 32b 08 2024 worse at RP than the previous version?

It's not slopped, just dry and "mechanical" writing.

Pic unrelated.
>>
>>102257163
I mean barneyfag is 24/7 watching literally every board for ponies. Mental illness is a hell of a drug.
>>
>>102257104
Just people tired of grifters promoting their bullshit for donations or employment somewhere.

Capcha: TNM0AT
>>
>>102257104
>if they're good at prompting
then probably not
>if they aren't
then patterns will show over time and you can shit on them once you know.
there's usually one or two lazy retards that do use llms to write their troll shitposts for them every 3-5 or so threads but there is also always the possibility that it's genuinely just a sperg or an ESL-tard just shitting up the thread
>>
>>102257185
Fuck you it's a kids show
>>
>>102257185
>barneyfag
Holy fuck, he's still at it?
I sure hope he's gotten a bot to monitor threads at least.
>>
Kobo, will you add speculative decoding in next release?
>>
>>102257104
Aren't those just typical 4chan posts?
Compared to regular /g/ threads I would say /lmg/ is still pretty well-behaved and on-topic.
>>
XTC sounds like a crypto scam
>>
>>102257092
not a nice way to talk about your mom and your 3 dads
>>
is there any way to quantize jamba and run it on cpu by any means at all? the transformers method quants using bnb which only works on gpu. I haven't tried vllm but it says it uses the transformers backend for jamba so that should be the same.
>>
>>102257304
You know what else sounds like crypto scam?
>>
>>102257104
The content of your post and what you're quoting doesn't match.
The second post is answering the puzzle? Like, did it hurt your feelings that he called the bird of the puzzle a piece of shit? Your reaction is weird, like you're some kind of snowflake.
>>
>>102257071
Logits need to be sorted before selection. If you have a really high topk (and a really big vocabulary) sorting that list can take some time. Very little difference from 10 to 100, but it gets really slow if you use the entire vocabulary (~128k for llama3 and gemma2, i think).
>>
reflection is a really obvious meme btw and it's embarrassing that any of you are falling for it
>let's overbake on CoT (probably with samples suspiciously similar to the evals we're testing)
>ohhh wowwww it's the best model ever, totally better than all the corpo giants (on that small set of simple evals)
>meanwhile, still chokes on simple real world tasks because it's insanely overbaked to use CoT for everything
never believe the benchmark killers, if the model was actually good they would put examples of it solving hard real world problems first and foremost
>>
>>102257344
I mean I'm pretty sure sonnet 3.5 uses the same method.
>>
>>102257336
>and a really big vocabulary
Ah, there it is. That's where my intuition was taking me.
Thank you for laying it out for me anon.
I suppose there's very little reason to not use some topK then. That explains why some models official spaces default to a value of 40 to 100.
>>
>>102257363
it clearly doesn't, reflection is literally outputting explicit cot
you can tell sonnet 3.5 isn't from streaming response times, basically the same as old sonnet
>>
>>102257298
He seems to genuinely think that anyone saying a bad word must be the same anon.
>>
>>102257163
that guy has outright admitted to being in all those threads and crossposts from them (mostly to aids) regularly
>>
>>102257423
Are you genuinely that triggered that someone called a fictional bird a piece of shit?
>>
File: 1725628823463.jpg (194 KB, 1280x720)
194 KB
194 KB JPG
>>102257455
nta but,
kyskyskyskyskyskyskyskyskyskyskyskys
>>
>>102257092
>The bird is a lying piece of shit
You need to chill a bit dude, this is too much even for 4chan.
Don't you realize that there a lot of people that get discouraged from posting when they see mean words like that?
>>
>>102257455
don't care about those specific posts, just confirming the schizo who shits up all the AI threads on the site is real
>>
File: 1714841858546.jpg (122 KB, 1024x1024)
122 KB
122 KB JPG
We need to take steps to make /lmg/ a more diverse and welcoming community. From now on, no one is allowed to be mean anymore. It's time to make empathy a priority.
>>
https://huggingface.co/deepseek-ai/DeepSeek-V2.5
Deepseek 2.5 is on HF
>>
>>102257560
Diverse, you say? Okay, I can do that.
*rapes you*
>>
>>102257173
Not to be a dick, but it felt me before as well.
>>
>>102257542
It's called schizophrenia, there was nothing wrong with these posts. You're literally pissing yourself about people saying mean words.
>>
>>102257588
>don't care about those specific posts
>>
>>102257334
There's a pattern to your posts. I can't put my finger on it yet.
>>
>>102257585
*mechanical to me
lmao
>>
>>102257560
>>102257479
>>
>>102257104
What's wrong with XTC?
>>
>>102257561
epycfags WON
>>
>>102257561
Chat was ass at ERP and Coder was god mode at it. This could only be a downgrade over Coder.
>>
>>102257173
It sure is dry af, I'm still sticking it the old version.
>>
>>102257704
It fucks up newlines and EoS tokens because it doesn't have a blacklist.
>>
>>102257704
In principle nothing; it's a sort of typical-p decoupled from top-p and with one more hyperparameter, so that now it can act just on the head of the token distribution. It's not something as novel as the author makes it out to be, though.
>>
>>102257846
It doesn't do that because that's not how it works, bird lover.
>>
>>102257846
If it was properly designed and configured, it would only remove the top choices when it makes sense to do so. Typical-p as mentioned in >>102257853) removes the top choices too, but even at the lowest setting (which makes it deterministic), it doesn't affect the length of the messages much.
>>
>>102254279
sensible one was also not working. looks like there's a fix. I will try again
https://x.com/mattshumer_/status/1832015007443210706
>>
File: 1719466889266.png (443 KB, 512x497)
443 KB
443 KB PNG
>>102257104
>>
File: mistralpr00mpt.png (370 KB, 1277x1529)
370 KB
370 KB PNG
I know this is been spammed to death, but I need help with the new mistral prompting, specifically mistral large, I'm figuring something has to be wrong in my ST prompt because I never had this issue with another model. Basically, if I don't ban the EOS token, it will generate a blank message with 1 token, if I ban it, it responds perfectly normal, I took the prompt template straight from magnum 128B v2's HF page. Is there anything wrong with it? I'm thinking I'm misusing </s>, but i copied the format right off of magnum 123B so I don't know.
>>
>>102257985
look at how the prompt is formatted in the ST console window
I think your story string + system prompt prefix/suffix will result in weird redundant instruct tokens but idk if that would be what's resulting in your problem
>>
>>102254305
the answer is literally right there. the entire sampling API is being effectively rewritten right now and new samplers won't be merged until that is done.
>>
>>102257561
Do they use the same attention mechanism that flash attention doesn't work with?
>>
>>102258077
stop being mean
>>
>>102258094
mean? i didn't even call him a retard, which is what he is
>>
>>102258077
it's been more than a month. can't they do it faster?
>>
>>102258103
can't you be patienter?
>>
>>102258103
there were several iterations on the new API, but it should be merged very soon now
>>
>>102258099
you're part of the demoralization campaign against 4chan
>>
>>102254975
Thanks for the info. I was actually considering doing something similar but with largestral.
Now I'll have to reconsider how much ram I buy next.
>>
>>102257561
rm -rf /media/models/reflections-70b
>>
>>102257104
seems like pretty standard 4chan low-effort shitposting tbqh
>>
>>102256673
meds, now
>>
>>102256091
It's more compliant than Llama3 in my experience. The text is also close enough that I'm willing to accept the tradeoff, but I use q8.
>>
>>102258391
fuck you for insulting the bird
>>
This one works for me. I had to change to HF samplers with the original repo's tokenizer.
https://huggingface.co/legraphista/Reflection-Llama-3.1-70B-IMat-GGUF

sample output
https://pastebin.com/ArV6S1tn
>>
>>102256673
>>102258401
Agreed. I never fail to laugh at how the schizos who dredge up ancient information about the people they have vendettas against, apparently fail to recognise just what absolute, bottom of the barrel, Gollum-tier human garbage it implies that they are, that they have sufficient time on their hands that they are able to track other people's posting histories to that degree.
>>
So how was reflection compared to normal llama 3.1?
I'm tempted to download it but I know it won't be worth the trouble.
>>
It might just be the style biasing my impressions, but I'm really enjoying ArliAI-RPMax-12B-v1.1.
It's weirdly good at following instructions while using lorebook info, and not sounding like a god damn robot.
The only "bad" thing I can see so far is the usual nemo pattern repetition issue, and it very strongly preferring the
>"dialog" *narration*
style, which I stopped using as a default a while ago, but that's not a big deal.
The model did need some wrangling in the first three or so messages, but after that, it's been smooth sailing
>>
just port the new samplers yourself. you do know c++ don't you?
>>
>>102258548
>So how was reflection compared to normal llama 3.1?
They only just fixed the tensors in the last 5 hours, so very few local will have had time to re-download and quant it.
The previous release was inherently broken and literally unusable
>>
>>102258591
Oh shit, I forgot about that.
>>
>>102258587
I don't
>t. used to write c++ for a living
>>
Reflection really likes to hallucinate some wrong information when it starts thinking, then 'fixes' it with <reflection>

like, "Cats have five legs. <reflection>Wait, on reflection, cats have four legs</reflect>"
>>
>>102258591
>previous release was inherently broken and literally unusable
How do these guys even live and test their shit? "THIS IS BIGGER THEN GPT-5 LOCAL WON" when it literally doesn't even work.
>>
>>102257985
https://files.catbox.moe/0ukj1p.json
Context
https://files.catbox.moe/hej8wa.json
Instruct

I use these as a base for magnum v2 123b. Got them from the magnum repo, so it might help it might not (specifically the context template has some weird text towards the end that I usually don't see in other templates ("Let's get started. Please respond based on the information and instructions provided above.[/INST]"). Your problem could also be a corrupted/broken quant, could've gotten fucked mid download from HF I've had that happen to me once or twice.
>>
>>102258683
>How do these guys even live and test their shit? "THIS IS BIGGER THEN GPT-5 LOCAL WON" when it literally doesn't even work.
yah its complete bs. they wasted a huge chunk of my day yesterday with that crap. I don't trust them at all anymore. I'll be waiting for independent results and the fp16 before I even think about it again
>>
>>102257561
This looks like more than just a meme merge like some were saying. I think the chinks meant that v2.5 is replacing -chat and -coder as one model that's good at everything.
>>
Your wish is my command.
>>102258941
>>102258941
>>102258941
>>
>>102258591
>They only just fixed the tensors
Who fucked it, reflection guys or lmao.cpp niggers?
>>
>>102257344
The big issue with Reflection is that they apparently aren't aware of the massive confirmation bias that most LLMs have. They will confidently say something wrong and confirm that it's right as long as it's something not very present in its dataset.
>>
>>102253583
They'll be blamed anyway if Harris doesn't win, and ignored if she does.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.