[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102645080 & >>102632446

►News
>(09/27) Emu3, next-token prediction multimodal models: https://hf.co/collections/BAAI/emu3-66f4e64f70850ff358a2e60f
>(09/25) Multimodal Llama 3.2 released: https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices
>(09/25) Molmo: Multimodal models based on OLMo, OLMoE, and Qwen-72B: https://molmo.allenai.org/blog
>(09/24) Llama-3.1-70B-instruct distilled to 51B: https://hf.co/nvidia/Llama-3_1-Nemotron-51B-Instruct
>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
recap anon gay
>>
>>102654480
slash recap anon's salary
>>
Tell me something you've done with LLMs today.
>>
Does anyone have a magnet for llama 3.2 11b vision? For whatever reason it's unavailable to download in Europe.
>>
>>102654548
I was gooning a little after midnight but then I got bored (mistral large is getting stale) so I switched to gelbooru.
>>
>>102654480
slash miku's throat
>>
►Recent Highlights from the Previous Thread: >>102632446

--Local is dead, in other news the new OpenAI advanced voice mode is pretty cool

►Recent Highlight Posts from the Previous Thread: >>102632451

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
>>102654548
found some nonpozzed models with claudeslop
>>
>>102654614
I think you should seek help, I'm not even trying to be mean. Your obsession with this thread is very clearly unhealthy.
>>
Claude:
>API
>so good localfags finetune on its worst outputs
OpenAI Advanced Voice:
>API
>first TTS to do convincing emotions at inference
Dall-E 3:
>API
>so good localfags finetune SDXL (lol) on its worst outputs
local:
>slop
>shivers cope
>xtts-v2 cope
>flux cope
>discord sloptuners calling for reddit moderation in /lmg/
It's fucking over isn't it?
>>
Is 48GB of VRAM enough to make a 70B AWQ quant?
>>
>>102654710
>>xtts-v2 cope
valle + a LORA is all you need.
>>
File: 1727896698406980.png (140 KB, 766x733)
140 KB
140 KB PNG
they can't stop winning...
>>
>>102654739
>https://valle-demo.github.io/
>404
LOCALBROS...
>>
>>102654614
>>102654710
>>102654744
hi sam

>>102654701
it's sam. he is still butthurt about the fact that meta's llama 405 performs at the level of gpt4 and is open source. he can't take it away, he can't moderate it, so he just tries to scare new people off to prevent wider adoption.
>>
>>102654548
I made anime girls real!
>>
File: losang1.png (465 KB, 1178x983)
465 KB
465 KB PNG
>>102654710
>flux cope
I KNOW this is just a shitpost but flux is anything but cope.
>>
>>102654614
petra is better tahn you
>>
File: 1459460919407.jpg (45 KB, 396x337)
45 KB
45 KB JPG
>>102654548
Realized that it is functionally impossible for me to socialize with real people for more than a few minutes both because they bore me to tears and because what I consider a "good time" hanging out makes other people miserable.
>>
>>102654563
https://huggingface.co/unsloth/Llama-3.2-11B-Vision-Instruct
The least you can do is search for a reupload, you lazy fuck.
>>
>>102654802
Is it a competition?
>>
>>102654799
I'm not shitposting and it's not cope. flux doesn't hold a candle to Dall-E in terms of prompting and even SDXL is better at things like ripped clothes/dirty faces/blood.
>>
>>102654856
slop-e is worse at humans, sorry sam
>verification not required
>>
>>102654799
I don't know, Flux didn't really impress me. I guess you can make nice Migu images with it, but you can find those on the boorus too.
>>
>>102654883
>absolutely no argument
I accept your copecession.
>>
>>102654856
>flux doesn't hold a candle to Dall-E in terms of prompting
I literally tested all of this on day 1.
Flux blows DALL-E out of the water for prompt understanding and conceptual granularity. Like it's not even a fucking contest. Buy a fucking ad saltman.
>>
I have 96GB of VRAM and 128GB of RAM. Thinking about trying some 405b quants locally, but I've never attempted this before. I have some questions if anyone can help.

1. Can llama.cpp even load a model split between GPU and CPU without loading it entirely into RAM first? Meaning if I had a 150GB model would it OOM while trying to load it.
2. Is a quant like IQ2_XXS usable for 405b? Is it better in any way compared to a 70b q8?
3. I remember something about certain IQ quants being slow if offloaded on CPU. Is that still a thing and if so which quants is it?
>>
>>102654892
>I literally tested all of this on day 1.
Then you won't mind sharing some of the comparison images and prompts? I look forward to seeing them.
>>
>>102654903
>96GB of VRAM
4x 3090?
>>
Can I run a decent quant of mistral large with 24gb vram and 64gb ram? I don't care about inference speed. If I get 0.3t/s that's fine. I just want to try it.
>>
File: realtime.png (5 KB, 607x28)
5 KB
5 KB PNG
theyre either torching money on advanced voice or api pricing is a scam
>assume a voice convo costs $0.10/min
>if you use it 15mins/day, that's $45/month
>>
>>102654913
you haven't too :)
>>
>>102654903
>Can llama.cpp even load a model split between GPU and CPU without loading it entirely into RAM first?
Disable mmap.
2, no idea. I suppose they are reasonable if people use it for 70b.
3. At that point it doesn't matter much. it'd be the difference between 0.1 and 0.15 t/s.
>>
>>102654880
Ugly blonde chick spammer, I'm sorry for being mean to you. You are much better than that boring proprietary cocksucker that we got in your place.
>>
>>102654941
>I have proof local is better but I'm not going to share it
Caught lying and conceded with an emoticon immediately. Embarrassing.
>>
>>102654973
ADOLF HITLER IS A NIGGER
>>
>sperging because he got caught in an obvious lie
Yeah, it's over for local.
>>
>>102654960
^_^
>>
>>102654917
4090s, but yeah basically that
>>102654953
Thanks, I'll try with mmap disabled and hopefully it loads.
>>
>>102654960
The thread getting more dead means even shitposters are replaced by lower quality ones. Like Petra anon was way more interesting than cloud shill and buy an ad spammer
>>
>>102655038
Hi Sao
>>
>>102654903
I'm way out of that range. I could draw comparisons from others though. I think that 2.5bpw quants of 70b models outperform q6_K_M quants of 22b stuff.
>>
>>102655038
Hi Cuda dev
>>
>>102654927
You won't be able to fully load any decent quant of it on GPU. On RAM the biggest you can load is Q3_K_S. If you can, quant it yourself with bf16 or q8_0 embed and output, they should improve output quality without increasing model size too much. Get full 128GB so you can run Q6_K, it's worth it, trust me.
>>
>>102654960
petra is a proprietary cock sucker, he keeps spiting any local generals.
>>
>>102655038
This thread seems pretty active to me.
>>
>>102654913
I don't care about you that much to bother C:
>>
File: No.png (192 KB, 309x326)
192 KB
192 KB PNG
>>102655038
No, we're not going back to the cloud.
>>
>>102654960
Oldfag here, I agree with this. I have been here since the beginning and as far as I remember Petra at least cared about the general and wanted to end the Miku manace.
>>
>local is bad
Oh really? anthracite-org/magnum-v2-123b
>>
>>102655070
Thanks.
>>
>>102655125
>Petra at lesat cared about the general
No he fucking didn't, he did the same shit he did to /vsg/ by spreading FUD to try and kill the general.
>>
File: file.png (6 KB, 530x59)
6 KB
6 KB PNG
>>102655085
believe
>>
I'm using Silly's vector functionality with its native transformer.js lib, using
>Snowflakesnowflake-arctic-embed-m
as the embedding model.
Opinions, suggestions?
I'm using llama.cpp to serve the main model. I can't use that to both generate text and provide an the embeds functionality at the same time, right?
>>
>>102655128
Tried Luminum yet?
>>
>>102655153
There's no reason to try anything other than Magnum.
>>
>>102655038
The buy an ad spammer is Sao false-flagging.
>>
File: ps.png (1 KB, 323x79)
1 KB
1 KB PNG
>>102655141
>>
>>102655176
Hi Drummer
>>
>>102655165
anon, don't say shit like this even as a joke. You may invite people that unironically think this to the general. This is a fairly common occurrence in online communities.
>>
>>102655139
>the same shit he did to /vsg/
To think Miku was the right choice to defeat him...
>>
>>102655195
Good. I would like to be surrounded by people I agree with. I'm sick of being called a faggot any time I say anything in this general.
>>
>>102655139
>FUD
opinion discarded.
>>
>>102655212
>I'm sick of being called a faggot any time I say anything in this general.
Now you're just asking for it...
>>
>>102655215
It was genuine FUD and you know it.
https://desuarchive.org/g/thread/95983527/#95984811
>>
File: belief.png (592 KB, 747x800)
592 KB
592 KB PNG
>>102655239
>>
File: 1721097824649005.jpg (47 KB, 562x675)
47 KB
47 KB JPG
>>102655239
I was actually talking about your use of cryptobro vocabulary, but okay, I guess.
>>
Can't wait for you to get IP wiped again.
>>
File: 34 Days Until November 5.png (2.14 MB, 1368x1192)
2.14 MB
2.14 MB PNG
>>
>>102655128
>anthracite
They have like unlimited VRAM for free at their disposal. Where is our dedicated ERP model after all this time?
>>
>>102655511
vance won the debate
>>
>>102655511
hi betiful show bobby plz
>>
>>102655511
Will you start another arbitrary countdown after 34 days? 260 days until bitnet or some such.
>>
File: 1727901044818374.webm (1.85 MB, 1088x720)
1.85 MB
1.85 MB WEBM
What are the implications of this to local models?
>>
>>102655743
>webm
>>
>>102655743
openaisisters.. not like this
>>
>>102655743
Open AI bubble bursting is good for local.
>>
>>102655743
Wtf is that real? Is Sam OK?
>>
>>102655743
me in the back
>>
>>102655743
>so this is what those faggot artists meant by the "A.I bubble bursting"...
>>
>>102655743
This is extremely unethical and unsafe. We need to regulate China NOW.
>>
>>102655743
openai BTFO
>>
>>102655608
based
>>
File: we so essited.png (4 KB, 272x114)
4 KB
4 KB PNG
ah good to know corpospeak still pisses me the fuck off >>102655743
seems like every vidgen site is getting hammered lately, here's hoping local doesn't fall behind by the end of the year.
>>
>>102656093
oh my god i wasn't expecting the squishing sound effect, anon failed to deliver the sound of (((altmann))) being inflated.
https://files.catbox.moe/j254od.mp4
>>
>>102656093
sorry anon they have to get through my 10 queued gens of inflating girls first
>>
>>102656122
Understandable, have a nice day.
>>
>>102654927
Yeah I was running Mistral Large IQ4_XS at low context with that 24gb vram 64gb ram config, it works, probably around 0.3t/s yeah. I think page file might have been used there tho, I don't remember as it was a few weeks ago.
>>
>>102655743
we are really getting there huh? at this point I wonder why LLMs are still so far behind.
https://xcancel.com/emollick/status/1841345969184498168#m
>>
>>102656093
>here's hoping local doesn't fall behind
nobody tell him local already fell behind
>>
>>102656303
the year's not up fat lady, you can't be singing yet.
>>
New official comments about the state of local AI:
Joe Biden: asfeiogjegjewigrji what?
Donald Trump: Tremendous progress bigly even
>>
>>102656391
33 days left until october 5th
>>
File: 62b1a79c5e075.jpg (215 KB, 1024x576)
215 KB
215 KB JPG
>>102656313
>>
>>102655030 (me)
Okay, 405b IQ2_XXS works and is coherent. 1.5 tok/s on 4x4090 + 128GB RAM. And I'm only using half the RAM slots, the bandwidth could easily be doubled if I buy 4 more sticks. Not bad speed at all, definitely usable for testing purposes.

Unfortunately it's kinda retarded. It's making mistakes and weird choices I'm pretty sure 70b at a decent quant wouldn't. It fucks up grammar and misspells words, or makes up nonsensical words sometimes too. I'm downloading IQ3_M, which I think should just barely fit across my VRAM+RAM, let's see if that one's any better.
>>
>>102656450
im speaking, sound familiar?
>>
>>102656112
Huh? You mean they generate audio for the video too?
>>
>>102656477
GUTEN MORGEN KAMERAD DUMMKOPF WIR VERSAMMELN UNS HEUTE UM DEITSCHLAND ZU DIENEN
>>
File: images.jpg (4 KB, 289x174)
4 KB
4 KB JPG
>discovered I could run mistral large q2_xs slowly on my computer
>refuse to go anything smaller now despite 1 t/s because any other model seems retarded, boring and/or shallow in comparison
I hate this
>>
>>102654903
A 96GB Vramlet should be targeting decent quants of 70-100+B tier models. Llama 3.1 series suffers more from quantization than most. For 405B if you can't fit at least Q5 you'd be better off with the Q8 70B version. Right now I think Largestral and Qwen 72B are the best options for this range of memory, and I'd pick them over any Llama unless I had a DGX supercomputer collecting dust.
>>
MN finetunes seem giga retarded compared to using Gemma 2 9b it simpO. Prose sucks too.
>>
>>102656755
I don't care anon
>>
>>102656767
color me surprised
>>
>>102656767
>Gemma 2 9b it simpO
downloading now, i'll try it but i don't have high hopes
my cornucopia of nemo meme merges and tunes have been serving me really well
>>
>>102656767
8k context = useless
>>
>>102656514
You'll hate more when there's a small model release that's hyped up, you try it, and see it writing paragraphs every second and then you read them and they're the most generic, context ignoring shit you've seen.
>>
>>102656966
Can Gemma 2 actually use 8k context now? Last time I checked sliding window attention was working with a hackjon
>>
>>102656966
What the FUCK does ANYONE need that much context for? Do you have any idea how large 8k tokens is? A token is most of a word. 4chan posts only allow 2000 CHARACTERS at most and only severely mentally ill people use even half that much.
>>
>>102657170
>Do you have any idea how large 8k tokens is?
yeah, it's not much
>A token is most of a word
wrong, count the words and tokens in a long reply that has names and stuff
>4chan posts only allow 2000 CHARACTERS at most and only severely mentally ill people use even half that much
nobody writes 4chan posts with it, and the imageboard equivalent of the context would be the whole thread anyway, not a single post
>>
>>102657152
Not him but I've already passed that phase. If it's not on Livebench and it's score on language and IF aren't very high, I won't bother.
>>
>>102657170
Tweet brain zoomer detected.
>>
https://www.youtube.com/watch?v=INpdA-yikHs
>>
>>102656904
update:
pros:
>the 9b is smarter and is easily gleaning things contextually i'd have to tard wrangle nemo to interact with.
cons:
>safety bullshit
>uses *'s to italicize words, fucking up my use of them to wrap thoughts/actions
>safety bullshit
>tends to try to wrap the story up and dissect it
>safety bullshit
>uses emojis
>positivity bias
>randomly adds double spaces
>has a tendency to tell instead of shows through summarization
>>
File: 1727343528693300.png (27 KB, 298x156)
27 KB
27 KB PNG
>>102657170
8k is nothing.
>>
>>102657259
post full log please I'm out of commission and need something to read today
>>
>>102657252
try arcanum-12b
it's decently coherent, creative and uncensored
>>
>>102654903
how much t/s do you get on gemmasutra 2b?
>>
>>102657306
arcanum was my main go-to model until Lumimaid-Magnum-12B came out
>>
>>102657406
hi undi
>>
>ggml_cuda_host_malloc: failed to allocate 3886.00 MiB of pinned memory: invalid argument
Ok, guess I won't be using my LLMs tonight
>>
In mother russia LLM uses you
>>
>>102654973
Yeah, localfags are grifters and water is wet.
>>
>>102657542
water isn't wet
>>
>>102657488
i get that all time the time and it still works?
>>
>>102657560
Does the water get you instead?
>>
>>102657569
They have a fight
Triangle wins
>>
Shit on your mother's medical heart
>>
>>
>>102657542
Water itself is not wet. Wetness is a property that describes how something feels when it comes into contact with water or another liquid. An object can be made wet by adding water to it. So while water makes other things wet, water itself is not inherently wet.
>>
>>102657628
newfag-kun... most things you see on 4chan is not literal. https://desuarchive.org/_/search/text/water%20is%20wet/
>>
you know what is wet? lecun's dick after a stop at the playground.
>>
>>102657670
I understand that the phrase "water is wet" is often used metaphorically or figuratively rather than literally. However, if we analyze it from a scientific perspective, water itself does not have the property of being wet because wetness refers to how something else feels when it comes into contact with a liquid like water. So technically speaking, water itself cannot be considered wet in the literal sense.
>>
>>102657628
Water molecules get suspended by other water molecules due to the geometry of the covalent bonds in the molecule. It's the reason why water actually decreases in volume when it melts unlike just about every other known substance. Water is uniquely capable of making itself wet.
>>
File: IMG_2163.png (12 KB, 627x209)
12 KB
12 KB PNG
tranny nigger faggot sisters...
>>
>>102657745
like cuckold, it hit too close to home on some mod's nerves
>>
>>102657745
A good thing for said faggots, trannies and n-words, 4chan is dead, good riddance i guess. You already can see tumblr-level cringe here thanks to redditors.
>>
>>102657745
Lol, some kind of primitive algorithm. Meanwhile literal anons are crafting advanced AI algorithms that will accurately make moderation decisions without overcensoring people.
>>
>>102655287
>cryptobro vocabulary
>FUD
Jesus christ, this term has been in use for 30+ years retard.
>>102657170
>I don't need more than 8k for my coomer roleplay and funny 4chan posts so no one does
god bless you retard, hopefully you'll learn one day the world doesn't revolve around you.
>>102657628
Never fails to amuse me when someone fails the autism test.
>>
>>102657745
What exactly is that supposed to solve?
>>
>>102657808
>human decency is reddit
maybe rethink your engagement with people online to not be such a toxic asshole? thanks.
>>
>>102657828
Yes, this kind of cringe, thanks for proving me right.
>>
>>102657745
faggot faggot faggot
>>
>>102657815
>Jesus christ, this term has been in use for 30+ years retard.
do not reply to the cat posting zoomer
>>
>>102657816
Not being able to call the kettle black
>>
>>102657810
Imagine thinking that it is a thing or even close to being good. The only thing that would be compelling enough to work is being able to enforce 2004-2006 posting behavior and hell if that would ever work on 4chan. Only alt-chans can do it because they are small enough and the userbase that even bothers for alt-chans aren't cancer.
>>
>>102657816
Tourists and redditors raiding this place are too soft, please understand.
>>
>>102657878
It's called a joke anon. Pretty sure that script people were making wasn't supposed to be cereal either.
>>
>>102657628
water is obviously wet
the real question is if ice is wet
>>
>>102657957
Ice itself is not wet. Wetness is a perception that occurs when liquid water comes into contact with a surface. Since ice is solid water, it does not make things feel wet until it melts into liquid form. Therefore, ice itself is not wet, but it can cause wetness as it melts.
>>
>>102657815
FUD is as common as using "they" to mention someone of uncertain gender, brainrotted boomer.
>>
>arguments about water and reddit
i t ' s
o v e r
>>
>>102658034
FUD's an actual term.
Singular they everywhere is a psyop.
>>
See: >102654614
>>
>>102658038
Blame redditors starting it with LLM replies.
>>
File: O-Zone numa numa.png (675 KB, 1920x1080)
675 KB
675 KB PNG
>>102656457
>the bandwidth could easily be doubled if I buy 4 more sticks
Theoretical bandwidth theoretically doubles in theoretical use cases.
>>
>>102657627
I like my LLMs like my women, big and sloppy
>>
>>102658216
>big and sloppy
Uhm.. you are le unbased tourist newfag or something...
>>
>>102658068
>>102658034
Singular they is older than you faggots are, goddamn zoomers.
>>
>>102658411
go to bed grampa
>>
File: AGI_confirmed.png (383 KB, 648x764)
383 KB
383 KB PNG
>>102656391
Biden can see the future
>>
>>102654548
Made a document summary, had it rewrite something in better English, now trying to code the project that will end my field of work and free me from it.
>>
>>102658441
Biden won't live to see 2 years from now. Everyone is surprised he survived his term at all.
>>
>>102658441
What does he know about that though? Serios.
>>
>>102657816
Increase the quality of the site while filtering out people that shouldn't even be allowed to breath.
>>
>>102658441
>AI is going to change everything!
>also I don't have anything to say regarding what my experience and knowledge on the topic is that logically leads to that claim
>>
>>102654548
sex
>>
>>102658511
the new ai safety department that openai has to run all their upcoming shit through
>>
>>102658547
Based, they should get wall-shot in communist style for saying things you personally don't like.
>>
Reflection 70B just got confirmed to have been just an OpenAI undercover experiment to test the waters for strawberry:
https://glaive.ai/blog/post/reflection-postmortem
>>
>>102658571
No where does it say that. It's just a blog post rephrasing all the excuses made on twitter in more professional language. Waste of reading time.
>>
>>102658571
I shall now modify their dataset to make it ideal for cooming
>>
>>102658571
holy kek they actually just trained a fucking model to blank out the word "claude" just like their word filter did to "reproduce" its behavior
I'm amazed at the brazenness of it
>>
>>102658629
You need to read between the lines
>>
Is gpt 4o AGI?
>>
>>102658555
Do you think he understands any of it though? Or that a lot of government workers do?
>>
File: formula.png (3 KB, 144x84)
3 KB
3 KB PNG
Can some kind anons ask their favorite multimodal AI to convert the attached image to latex, and post the results? I want to convert a lot of these and I'm shopping for a new model.
>>
>>102658411
You literally don't know the difference between historical uses of they and tranny everybody is a they they.
People who don't know the language shouldn't be humored to screw around with it.
>>
>>102658733
How did this pile of shit release again?
>>
https://glaive.ai/blog/post/reflection-postmortem
Matt Schumer is back
>too much yapping
if someone has the courage to read all this shit and make a tl:dr that would be appreciated
>>
>>102658812
Looks safe to me.
>>
>>102658812
perhaps it doesn't know latex?
>>
>>102658827
you really couldn't be bothered to ctrl+f or even fucking scroll up 5 posts to see if it's already been posted?
>>
>>102658827
Just plug a random model and make it do a summary.
>>
File: file.png (170 KB, 340x270)
170 KB
170 KB PNG
>>102658857
no uwu
>>
>>102657816
I was simply too based and thus must be constrained
>>
It's insane how popular ollama is. Nothing else comes close. Even llama.cpp is not as popular as ollama.
>>
>>102658827
I didn't read but I don't like his vibes and nothing on that front has changed
>>
>>102657816
>What exactly is that supposed to solve?
kill 4chan, its main appeal is to allow people to say whatever they want
>>
>>102658911
It is a mystery to me. It has a few very annoying features but it just works™
>>
>>102658827
>if someone has the courage to read all this shit and make a tl:dr that would be appreciated
Are redditors not even aware that LLMs can do things besides ERP, like summarization?
>>
>>102658812
tell it to stop kinkshaming
>>
>>102658911
People don't care as long as they can run the thing, efficiency and configuration be damned.
>>
>>102658490
I thought ai would be good at summarizing but it isn't. It misses stuff and then adds stuff that wasn't in there. I don't see how this technology is useful for anything serious.
>>
Banana
https://huggingface.co/m8than/banana-2-b-72b/tree/main
>>
>>102658827
Accord to mixtral

On September 5, Sahil Chaudhary announced Reflection 70B, a finetuned model showing SoTA benchmark numbers. There has been confusion over irreproducible scores, leading Sahil to publish a postmortem explaining how to reproduce the model's benchmark scores. He shares the model weights, training data, training scripts, and eval code, and has worked with community members to verify the benchmark scores' reproducibility. Sahil also addresses issues of dataset contamination and model behavior, and shares the training script and hyperparams used for training the model. He admits to rushing the initial model release without proper verification and handling of public criticism.
>>
>>102658975
>"_name_or_path": "Qwen/Qwen2.5-72B-Instruct"
Okay.
>>
>>102658974
have you tried using something besides a 3b?
>>
I just wrote a post with niggers and faggots it said it posted and then marked >>102658914 as (you) for me. Great....
>>
>>102659007
Yeah, that was claude.
>>
>>102658571
Local will be back soon. Zuck stole gpt 5 Orion and made it into glasses.
>>
File: ask.png (135 KB, 1141x1156)
135 KB
135 KB PNG
>>102658812
>>102658855
It works when you ask differently. I tried to render it and I think it's wrong, but I don't know LaTeX, is it?
>>
>>102659056
>NLGGER
nice try
>>
>>102659056
kek
>>
>>102659037
Yes, it's giving the fraction inverted. Doesn't surprise me.
>>
>>102659065
> tards will literally see the word NlGGER and say "nice try"
The point is how pointless it is to try and control this shit
>>
>>102659037
Yep, it fucked up. Swapped the numerator and denominator.
>>
>>102659152
He is baiting you.
>>
>>102657816
We NEED more tourists, please understand. Diversity is our strength.
>>
File: 1697174242507084.png (123 KB, 1624x723)
123 KB
123 KB PNG
>>102658733
>>
>>102659317
>announces bait
Are you still baiting by pretending to be a retard?
>>
>>102659317
Are you mad? Lol
>>
>>102659317
He mad
>>
oh so the namefield is counted too in the limit
>>
Remember: OP is a faggot? And now you can't say faggot anymore. GOD I WISH THIS SITE WAS DEAD ALREADY
>>
>>102659471
>And now you can't say faggot anymore
You just did.
Twice.
>>
>>102659486
Now say it 4 times faggot. You dumb faggot. You stupid faggot.
>>
File: qwen2-vl-max.png (47 KB, 1609x504)
47 KB
47 KB PNG
>>102659263
Huh. Weird. The public demo is not doing it for me, even though it's supposed to be the 72B model.
>>
Faggot faggot faggot Sam Altman
>>
File: file.png (210 KB, 1520x562)
210 KB
210 KB PNG
>>102645865
I think I will go with something like this, then I will give a summary of the context/character personality before showing the logs or something.
>>
>>102659511
Turn off your memesamplers faggot.
>>
>>102655128
Oh, could ya share your sampling settings? I got mine working fine, but can't find a good sweetspot.
>>
>>102659514
>clicks "Right is better"
TPD
>>
File: 1727647561770630.png (131 KB, 966x753)
131 KB
131 KB PNG
>>102659507
Weird. I'm just using the AWQ version with vLLM, with top k 1.
>>
File: threadrecap.png (1.48 MB, 1536x1536)
1.48 MB
1.48 MB PNG
►Recent Highlights from the Previous Thread: >>102645080

--Paper: Paper on accelerating multimodal generation model inference:
>102646009 >102647985
--Papers:
>102645814 >102646045
--Users share audio samples and discuss speech synthesis models:
>102646324 >102647459 >102648132 >102648254 >102649305
--Multi-head Latent Attention (MLA) paper claims reduced KV cache, but memory usage concerns raised:
>102648527 >102648560 >102648602 >102649612 >102648983
--Hugging Face releases benchmark to measure LLM roleplay:
>102652259 >102652336 >102652408 >102652514 >102652659 >102652758 >102652793 >102652828 >102652800 >102652956 >102653139
--Generate chibi Migus on Flux Dev using Hugging Face models:
>102650399
--Flash attention has no significant catch, with benefits like reduced VRAM usage and no model degradation:
>102645456 >102645472 >102645486 >102647960 >102645507
--EleutherAI blog post fact-checks NYT article on Yi-34B and Llama 2:
>102653188
--Creator of styletts2 seeks computing resources to reproduce Adobe TTS model:
>102645693
--RP arena idea using pre-made completions from RP logs:
>102645865 >102645958 >102646025
--Qwen team working on Omni voice mode with no ETA:
>102652875 >102652908 >102652974 >102653070 >102653035 >102653069 >102653093 >102652988 >102653744 >102653799 >102653897 >102654027 >102654232 >102654062 >102652976
--Qwen chronos finetune and Nala prompt discussion:
>102647275 >102647597 >102647629 >102647692 >102653159 >102653233 >102653256 >102653324
--P40 GPUs are hard to find at a decent price, with eBay prices around $300 each:
>102646134 >102646142 >102646531 >102647407 >102647462 >102647500 >102646562 >102650487 >102650967
--Miku (free space):
>102645126 >102646535 >102646715 >102646977 >102647557 >102647574 >102647608 >102647934 >102650929 >102651253 >102655201 >102655613

►Recent Highlight Posts from the Previous Thread: >>102645094

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
>>102659603
Slop recap.
>>
>>102659507
Ask for latex code, not just latex.
>>
>>102659616
No matter how I ask it always makes the same mistake.
Copilot and Claude 3.5 Sonnet fail in the same way, while o4-mini and Gemini Pro get it right.
>>
>>102659655
>o4-mini
is that typo supposed to be 4o-mini or o1-mini?
>>
how am I supposed to run molmo 72b? I can run normal 72b models fine. there has to be SOME way to quant it
>>
>>102659703
>how am I supposed to run molmo 72b?
The intended way is with python, as shown on their model card. For llama.cpp or anything else, you'll have to wait.
>>
>>102659670
Yeah, sorry, 4o-mini
>>
>change model
>have to change sampler parameters for it to not be retartded
>save settings
>shit, overwrote previous model's settings
>now have to figure those out again
sometimes this hobby is a pain in the ass you know?
>>
>>102659855
I always write settings and other stuff in a notepad because I often forget everything.
>>
>>102659603
Hi recap anon!
>>
>>102659603
>--EleutherAI blog post fact-checks NYT article on Yi-34B and Llama 2:
>102653188
>The thread is so dead that he has to include shit that couldn't be more unworthy of being highlighted.
>>
>>102659603
>EleutherAI blog post fact-checks NYT article on Yi-34B and Llama 2
Based recap anon.
>>
>>102659922
>>102659935
Obvious samefag is obvious.
>>
>>102654701
>flux cope
>openAI good slop
>localAI bad slop
Get better baits
>>
>>102654614
i bask in smug schadenfreude being the guy who said "i told you so". local models are a scam, you're a bunch of placated fools. they give you these scraps so that you arent rioting in the streets. they manipulate you dumb freetards so they have a pasture of copecows going "local will catch up soooooon!!" as your unwieldy stuff stagnates while theirs continues to improve. they hand you models and then paint you as an example of why there should be more regulations and restrictions on AI. local models are the planted gun. zuck even said that if llama ever actually gets good then they'd stop releasing it open.
local shit is even more pozzed and useless than the premium slop, yet you defend it based on the hypothetical rather than the actual. you're the injuns: trading your future for a couple of fire sticks, failing to grasp the bigger picture, the inevitable. local has no future due to the nature of ai tech. the amount of money and data needed to train, the increasing model size that vastly outpaces consumer hardware, the lack of actual 'source code' that can be viewed and modified. they even hijack the term "open source" when these models are essentially blackbox .exes
show me the training data for llama
show me the training code
and even if you had it you can't do a single thing to fix it, because you don't have a gigacluster of gpus. there's a reason local sucks, and that's because the technology itself is fundamentally incompatible with open source collaboration. they know local is irrelevant, they know it will never have a chance at catching up. it's all a game to frame you as evil coomer terrorists so that they can secure a 100% market domination by regulating gpus like they did with LHR/crypto and passing enough legislation that makes it impossible for any startup to compete

so yes, local has stagnated and will continue to wither until it's eventually snuffed out. a flash in the pan, nothing more than fuel for the saas machine. the corpo marches on
>>
>>102659882
guess i should do something like that too.
>>
>local models
>doesn’t specify LLM

With whisper large turbo out, I’m looking to improve my transcription/Diarization pipeline

Is pyannote Diarization 3.1 still the goat or has the meta changes
>>
>>102660307
>fr fr no cap me not understand me play pretend retarded
>>
>>102660150
I hope you had fun writing this but please take your meds now
>>
>>102659600
I've been wanting to give vLLM a try. Does AWQ work with multi-GPU?
>>
>>102660323
Nah i'm good, can't say that about your fuckbuddies ITT though
>>
>>102660376
Yes.
>>
>>102660058
>everything i don't like is le bait
am laffin
>>
>>102655743
*inflates you making you big and round*
>>
>>102660315
/g/ could be so much better. Too bad it’s just consumer electronics and coomer chatbots
>>
File: Untitled.png (437 KB, 1080x2672)
437 KB
437 KB PNG
VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment
https://arxiv.org/abs/2410.01679
>Large language models (LLMs) are increasingly applied to complex reasoning tasks that require executing several complex steps before receiving any reward. Properly assigning credit to these steps is essential for enhancing model performance. Proximal Policy Optimization (PPO), a state-of-the-art reinforcement learning (RL) algorithm used for LLM finetuning, employs value networks to tackle credit assignment. However, value networks face challenges in predicting the expected cumulative rewards accurately in complex reasoning tasks, often leading to high-variance updates and suboptimal performance. In this work, we systematically evaluate the efficacy of value networks and reveal their significant shortcomings in reasoning-heavy LLM tasks, showing that they barely outperform a random baseline when comparing alternative steps. To address this, we propose VinePPO, a straightforward approach that leverages the flexibility of language environments to compute unbiased Monte Carlo-based estimates, bypassing the need for large value networks. Our method consistently outperforms PPO and other RL-free baselines across MATH and GSM8K datasets with fewer gradient updates (up to 9x), less wall-clock time (up to 3.0x). These results emphasize the importance of accurate credit assignment in RL finetuning of LLM and demonstrate VinePPO's potential as a superior alternative.
https://github.com/McGill-NLP/VinePPO
neat
>>
File: Untitled.png (857 KB, 1080x2317)
857 KB
857 KB PNG
Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?
https://arxiv.org/abs/2410.01623
>Low-rank training has emerged as a promising approach for reducing memory usage in training Large Language Models (LLMs). Previous methods either rely on decomposing weight matrices (e.g., LoRA), or seek to decompose gradient matrices (e.g., GaLore) to ensure reduced memory consumption. However, both of them constrain the training in a low-rank subspace, thus inevitably leading to sub-optimal performance. This raises a question: whether it is possible to consistently preserve the low-rank constraint for memory efficiency, while achieving full-rank training (i.e., training with full-rank gradients of full-rank weights) to avoid inferior outcomes? In this paper, we propose a new plug-and-play training framework for LLMs called Fira, as the first attempt to achieve this goal. First, we observe an interesting phenomenon during LLM training: the scaling impact of adaptive optimizers (e.g., Adam) on the gradient norm remains similar from low-rank to full-rank training. Based on this observation, we propose a norm-based scaling method, which utilizes the scaling impact of low-rank optimizers as substitutes for that of original full-rank optimizers to enable full-rank training. In this way, we can preserve the low-rank constraint in the optimizer while achieving full-rank training for better performance. Moreover, we find that there are sudden gradient rises during the optimization process, potentially causing loss spikes. To address this, we further put forward a norm-growth limiter to smooth the gradient via regulating the relative increase of gradient norms. Extensive experiments on the pre-training and fine-tuning of LLMs show that Fira outperforms both LoRA and GaLore, achieving performance that is comparable to or even better than full-rank training.
https://github.com/xichen-fy/Fira
No code posted yet but there is pseudocode in the paper. results look good
>>
>>102660464
do you need to specify anything in the command line or is it automatic? I tried to load an AWQ with vllm serve /path/to/awq --max_model_len 4200 and it OOM's after filling the first GPU.
>>
>>102660530
>>102660613
Reminder to all brainlets that https://illuminate.google.com/ is great to help understand papers.
>>
>>102660630
You need to specify the number with --tensor-parallel-size.
>>
>>102660636
What is the difference between this and notebooklm?
>>
File: ilikeit.png (115 KB, 1809x715)
115 KB
115 KB PNG
>>102660664
The tone feels a bit less casual than Illuminate and they recently added parameters letting you customize the conversation. I like it.
>>
>>102660687
*the tone feels a bit less casual than NotebookLM, time for me to go away for the day.
>>
>>102658733
Did you try searching? https://github.com/lukas-blecher/LaTeX-OCR
>>
File: Untitled.png (1.28 MB, 1080x2795)
1.28 MB
1.28 MB PNG
FlashMask: Efficient and Rich Mask Extension of FlashAttention
https://arxiv.org/abs/2410.01359
>The computational and memory demands of vanilla attention scale quadratically with the sequence length N, posing significant challenges for processing long sequences in Transformer models. FlashAttention alleviates these challenges by eliminating the O(N2) memory dependency and reducing attention latency through IO-aware memory optimizations. However, its native support for certain attention mask types is limited, and it does not inherently accommodate more complex masking requirements. In this paper, we propose FlashMask, an extension of FlashAttention that introduces a column-wise sparse representation of attention masks. This approach efficiently represents a wide range of mask types and facilitates the development of optimized kernel implementations. By adopting this novel representation, FlashMask achieves linear memory complexity O(N), suitable for modeling long-context sequences. Moreover, this representation enables kernel optimizations that eliminate unnecessary computations by leveraging sparsity in the attention mask, without sacrificing computational accuracy, resulting in higher computational efficiency. We evaluate FlashMask's performance in fine-tuning and alignment training of LLMs such as SFT, LoRA, DPO, and RM. FlashMask achieves significant throughput improvements, with end-to-end speedups ranging from 1.65x to 3.22x compared to existing FlashAttention dense method. Additionally, our kernel-level comparisons demonstrate that FlashMask surpasses the latest counterpart, FlexAttention, by 12.1% to 60.7% in terms of kernel TFLOPs/s, achieving 37.8% to 62.3% of the theoretical maximum FLOPs/s on the A100 GPU.
https://github.com/PaddlePaddle/Paddle/blob/develop/test/legacy_test/test_flashmask.py
https://github.com/PaddlePaddle/PaddleNLP/tree/develop/llm/alignment/rm/flashmask
>>
>>102660769
chat is this real?
>>
File: 39_04741_.png (1.36 MB, 896x1152)
1.36 MB
1.36 MB PNG
Caution: Clouds may be closer than they appear
>>
>>102660792
Rin-chan has become one with the earth.
>>
File: 35345645746457242.png (599 KB, 1512x864)
599 KB
599 KB PNG
kek
>>
>>102658827
>We are able to reproduce the model benchmark scores initially claimed and are sharing the eval code.
>Just to be clear, we have never added any word filtering or made use of Claude APIs when we offered API access to Reflection 70B for people to try out the playground or test/benchmark the model with an API endpoint.
altman sabotage confirmed
>>
>>102660882
Based Altman.
>>
File: file.png (56 KB, 1782x544)
56 KB
56 KB PNG
https://huggingface.co/ISTA-DASLab/Meta-Llama-3.1-70B-Instruct-AQLM-PV-2Bit-1x16
Has anyone tried that shit?
>>
>>102658827
>We are able to reproduce the model benchmark scores initially claimed and are sharing the eval code.
Bulllshit, where's the model then SCHUMAN???
>>
>>102660472
>An not argument and just buzzword with seething bullshit would be treated as a opinion
No, troon, if you cannot develop an overall argument, don't put efforts in your words and thoughts, you're just a retard or a le baiting zoomer.
>>
Can somebody explain key/value/query shit in the transformer like I'm a retard? (I'm a retard)
>>
>>102660951
Sure! Let's break down the concepts of **query**, **key**, and **value** in transformers using a simple analogy.

**Imagine a Library:**

- **Query (Q):** Think of this as a request or question you have—like looking for books about space.
- **Key (K):** This represents the labels or tags on each book in the library—such as "astronomy," "history," or "science."
- **Value (V):** These are the actual contents inside the books—the information you want.

**How It Works:**

1. **You Have a Query:** You want books about space.
2. **Matching Query with Keys:** The librarian (the model) checks your query against the keys (book labels) to find relevant books.
3. **Retrieving Values:** Once the relevant keys are found, the librarian gives you the contents (values) of those books.

**In the Transformer Model:**

- Each word in a sentence is represented by vectors for queries, keys, and values.
- **Query Vector:** Captures what this word is looking for from other words.
- **Key Vector:** Represents what information this word has that might be useful to others.
- **Value Vector:** The actual information or meaning of the word.

**Attention Mechanism:**

- The model calculates how much attention to pay to each word by comparing queries and keys.
- It uses this to weigh the values and create a new representation of each word that considers its context.

**Why It's Useful:**

- This mechanism allows the model to focus on relevant words when understanding or generating language.
- It helps capture relationships between words, improving tasks like translation, summarization, and more.

**In Simple Terms:**

- **Query:** What I'm looking for.
- **Key:** What others have to offer.
- **Value:** The actual information others provide.

By using queries, keys, and values, transformers efficiently process and understand language by focusing on the most relevant parts of the input.
>>
File: Untitled.png (1.21 MB, 1080x3434)
1.21 MB
1.21 MB PNG
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
https://arxiv.org/abs/2410.01560
>Mathematical reasoning continues to be a critical challenge in large language model (LLM) development with significant interest. However, most of the cutting-edge progress in mathematical reasoning with LLMs has become \emph{closed-source} due to lack of access to training data. This lack of data access limits researchers from understanding the impact of different choices for synthesizing and utilizing the data. With the goal of creating a high-quality finetuning (SFT) dataset for math reasoning, we conduct careful ablation experiments on data synthesis using the recently released \texttt{Llama3.1} family of models. Our experiments show that: (a) solution format matters, with excessively verbose solutions proving detrimental to SFT performance, (b) data generated by a strong teacher outperforms \emph{on-policy} data generated by a weak student model, (c) SFT is robust to low-quality solutions, allowing for imprecise data filtering, and (d) question diversity is crucial for achieving data scaling gains. Based on these insights, we create the OpenMathInstruct-2 dataset, which consists of 14M question-solution pairs (≈ 600K unique questions), making it nearly eight times larger than the previous largest open-source math reasoning dataset. Finetuning the \texttt{Llama-3.1-8B-Base} using OpenMathInstruct-2 outperforms \texttt{Llama3.1-8B-Instruct} on MATH by an absolute 15.9\% (51.9\% 67.8\%). Finally, to accelerate the open-source efforts, we release the code, the finetuned models, and the OpenMathInstruct-2 dataset under a commercially permissive license.
https://huggingface.co/collections/nvidia/openmath-2-66fb142317d86400783d2c7b
https://github.com/Kipok/NeMo-Skills
From Nvidia.
>>
>>102660944
>shits out same buzzwords he accuses people of
Nice self-own.
>>
>>102660983
Man, 4chan would benefited from markdown, latex too.
What model did you use btw?
>>
>>102660944
No one cares about your culture war grift, go back >>>/pol/ transphobe.
>>
Thermodynamic Bayesian Inference
https://arxiv.org/abs/2410.01793
interesting
>>
"*How do I stop Mistral Small doing this?*"
>>
>>102661345
Use Mistral Large
>>
can someone post their sampler settings and all of their cards for mistral large?
>>
Is there a way to tell the model to stop doing something out of character? It keeps doing *nuzzles you* and *narrows her eyes* over and over. I've tried editing it out but it keeps doing it. Maybe it's just a Mistral thing.
>>
>>102661384
it's a mistal thing
>>
>>102661384
Uhm sweaty, you can shit on cloudkeks only! Local LLMs are totally perfect! A random /lmg/tard says so!
>>
>>102661449
How does it make economic sense anymore to use open-source models and host it ourselves? Openais fine-tuned models are as powerful as any small language model for domain specific tasks. Not just that, it's super duper cheap compared to hosting and running your own fine tuned models.

Apart from data privacy reasons, I don't see any other reason to fine tune and host my own models.
>>
>>102661475
>Apart from data privacy reasons
that's a pretty big fucking reason
>>
>>102660307
Whisper Large v2 > large v3 ime.
If you have any tips on how to get pyannote working properly, please gib.
>>
File: 1706443839292802.png (83 KB, 900x510)
83 KB
83 KB PNG
this https://x.com/_xjdr/status/1840782196921233871 fag playing with 1B model, he made a repo now https://github.com/xjdr-alt/entropix https://x.com/_xjdr/status/1841632017299210490
>>
>>102654548
did a revision session on arterial blood gases and ph buffers
>>
>>102661626
There's no need to call yourself a fag to fit in anon
>>
>>102661778
Well, i'll call you faggot instead because this is exactly what you are.
>>
>>102654548
I forced Qwen to code a script to force itself to pretend to be a janny, and do it for free.
>>
>>102661809
*expert janny
>>
>>102661797
well yeah you're the one sucking my dick after all
>>
>>102661797
lol gottem
>>
>>102654480
fren from real life wants me to learn how to tune models with him and is willing to spend up to 500 on renting servers. he wants to make a chatbot that can speak his negro language at a decent level. considering that all my CS knowledge is SICP C and uni stuff how much do i actually need to learn to make a negro llm that isnt total dogshit?
>>
>>102661778
Why so mad niggerfaggot?
>>
>midnight miqu keeps trying to give the elf a tail
I fucking hate you shills
>>
pixtral vs nvlm vs 3.2 vs molmo

which is best at captioning?
>>
>>102661927
you need to learn how to read the op and lurk before asking stupid questions
>>
Who set the Migus loose?
>>
And that's why any sane general never puts anime slop in OP, it reeks with redditor faggotry in here since day one.
>>
What do you guys use in System Prompt? Should I use anything other than Actor preset?
>>
>>102661962
ur mom is loose
>>
>>102662127
>something something no ethics sex sex no apologize sex
>>
>>102662127
After doing lots of personal research on system prompts back during the llama2 days, I came to realize that system prompts are a placebo meme.
>Write {{char}}'s next reply in this fictional roleplay with {{user}}.
This is all I use these days unless the model expects a specific one.
>>
>>102662127
>PLEASE behave like a larger model that requires more VRAM than I could possibly afford. If you do not, I will be fired from my job, causing my family to die and forcing me to take out my frustrations on people of the jewish faith.
>>
Is it normal for LLM to take increasingly more time to answer, or is it just my CPU heating up?
>>
>>102662352
it takes more time as the context grows
>>
>>102662406
Is there a solution to this? Like, a sliding window context?
>>
File: 1535212253394.jpg (51 KB, 720x958)
51 KB
51 KB JPG
Anyone know more cards that are designed to have surprises and provide an "experience" used blind? It's a pretty fun idea, but there's way too little of this "genre". I want more!
>>
>>102662417
You can use koboldcpp and enable context shift and lower the context size to match the speed you want.
>>
>>102662432
Thanks
>>
File: 1719393769572232.png (351 KB, 540x540)
351 KB
351 KB PNG
>OpenAI asks investors to avoid five AI startups

>As global investors such as Thrive Capital and Tiger Global invest $6.6 billion in OpenAI, the ChatGPT-maker sought a commitment beyond just capital — they also wanted investors to refrain from funding five companies they perceive as close competitors.
>The list of companies includes rivals developing large language models such as Anthropic and Elon Musk's xAI. OpenAI's co-founder Ilya Sutskever's new company, Safe Superintelligence (SSI), is also on the list. These companies are racing against OpenAI to build large language models, which requires billions in funding.
>The request, while not legally binding, demonstrates how OpenAI is leveraging its appeal to secure exclusive commitments from its financial backers in a competitive field where access to capital is crucial.
>While such expectations are not uncommon in the venture capital world, it's unusual to make a list like OpenAI has.
>>
>>102662466
>nooo I'm supposed to become the god-king of AI, you can't just give money to other AI companies t.altman
little bitch
>>
>>102662466
>Anthropic
>xAI
>SSI
Kind of funny that their three biggest concerns are all companies of OpenAI founders/early members that ran away from Sam
>>
not even openai sees open source as competition anymore
mistral and meta are irrelevant
>>
>>102662536
Only natural with corps who failed to capture the market at the start.
>>
>>102657745
Instead of blocking the posts they should just do string replacement à la basedboy.
>>
>>102658911
That's just regular network effects at play.
Things that are already popular get more popular automatically.
If just a few things had gone differently in the early days it would have been another one of the million llama.cpp frontends that would have gotten popular.
I think there was some early publication about ollama on Hacker News or something which gave the project a boost, the fact that the devs are ex Google (vs. literal whos from Europe) probably helped a lot.
>>
>>102662192
I find directives like "write in a vivid style" make a big difference for Mixtral 8x7B Instruct and Llama 3.1 70B Instruct. Absolutely not placebo. Whether you like the result better is up to you but things like that caude an immediate and dramatic change. NeMo is less affected and I make no representation as to whether sloptunes can still be guided that way.
>>
>>102659056
holy fuck lmao
>>
>>102660951
Watch the 3blue1brown video series on it.
>>
>>102662935
* For Mixtral 8x7B the "dramatic effect" becomes less reliable at Q5KM and completely unreliable at Q4KM. If your model isn't being affected by instructions maybe it's because you're running a low quant.
>>
Does anyone else have an issue where typing in SillyTavern gets more sluggish and laggy the further a conversation goes? Doesn't seem to be my GPU since I notice this lag even when using a 7B.
>>
>>102662981
browser, ram status? you're not using chrome are you?
>>
>>102659056
>NlGGERNlGGER__________NlGGER____NlGGER_NlGGER_NlGGER____NlGGER_NlGGER_NlGGER_NlGGER____NlGGER_NlGGER_NlGGER_NlGGER____NlGGER_NlGGER_NlGGER____NlGGERNlGGER_______
>NlGGER_NlGGER_________NlGGER____NlGGER_NlGGER_NlGGER____NlGGER_NlGGER_NlGGER_NlGGER____NlGGER_NlGGER_NlGGER_NlGGER____NlGGER_NlGGER_NlGGER____NlGGER___NlGGER____
>NlGGER__NlGGER________NlGGER____________NlGGER____________NlGGER_____________________________NlGGER____________________________NlGGER____________________NlGGER_____NlGGER___
>NlGGER___NlGGER_______NlGGER____________NlGGER____________NlGGER_____________________________NlGGER____________________________NlGGER____________________NlGGER______NlGGER__
>NlGGER____NlGGER______NlGGER____________NlGGER____________NlGGER_____________________________NlGGER____________________________NlGGER____________________NlGGER_____NlGGER___
>NlGGER_____NlGGER_____NlGGER____________NlGGER____________NlGGER_________NlGGER_NlGGER_____NlGGER_________NlGGER_NlGGER____NlGGER_NlGGER_NlGGER____NlGGER___NlGGER____
>NlGGER______NlGGER____NlGGER____________NlGGER____________NlGGER_________________NlGGER_____NlGGER_________________NlGGER____NlGGER____________________NlGGER_NlGGER_______
>NlGGER_______NlGGER___NlGGER____________NlGGER____________NlGGER_________________NlGGER_____NlGGER_________________NlGGER____NlGGER____________________NlGGER____NlGGER____
>NlGGER________NlGGER__NlGGER____________NlGGER____________NlGGER_________________NlGGER_____NlGGER_________________NlGGER____NlGGER____________________NlGGER______NlGGER__
>NlGGER_________NlGGER_NlGGER____NlGGER_NlGGER_NlGGER____NlGGER_NlGGER_NlGGER_NlGGER____NlGGER_NlGGER_NlGGER_NlGGER____NlGGER_NlGGER_NlGGER____NlGGER_______NlGGER_
>NlGGER__________NlGGERNlGGER____NlGGER_NlGGER_NlGGER____NlGGER_NlGGER_NlGGER_NlGGER____NlGGER_NlGGER_NlGGER_NlGGER____NlGGER_NlGGER_NlGGER____NlGGER________NlGGER
what did he mean by this???
>>
File: 1490075502548.png (266 KB, 665x574)
266 KB
266 KB PNG
>>102662993
>ram status
I have 64 GB of ram. Surely that can't be the iss-
>you're not using chrome are you?
Oh.... oh no....
>>
>>102663013
bro...
>>
>>102662997
hypothesis: ______
research: ___________NlGGER______NlGGER__
>NlGGER____NlGGER______NlGGER____________NlGGER__________
analysis: __NlGGER_____________________________NlGGER____________________________NlGGER____________________
conclusion: NlGGER_____NlGGER
>>
>>102662427
*surprises you with dogshit schizo formatting*
>>
We're about to get another big release next week. I see the patterns and there are clear signs pointing to another major new open model.
>>
Why does it sometimes take A LOT of time to produce a simple response? I've also noticed that the prompt immediately after the slow one completes very fast.
>>
>>102663750
your prompt changed and it has to reprocess the context
>>
>>102663772
>>102663772
>>102663772
>>
>>102662466
>literally directly begging investors not to invest in his rivals
wew that's pathetic.
How the fuck does anybody take this guy seriously anymore?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.