[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: 1708436326037322.jpg (962 KB, 1856x2464)
962 KB
962 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101567223 & >>101560013

►News
>(07/24) Mistral Large 2 123B released: https://hf.co/mistralai/Mistral-Large-Instruct-2407
>(07/23) Llama 3.1 officially released: https://ai.meta.com/blog/meta-llama-3-1/
>(07/22) llamanon leaks 405B base model: https://files.catbox.moe/d88djr.torrent >>101516633
>(07/18) Improved DeepSeek-V2-Chat 236B: https://hf.co/deepseek-ai/DeepSeek-V2-Chat-0628
>(07/18) Mistral NeMo 12B base & instruct with 128k context: https://mistral.ai/news/mistral-nemo/

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: 1713047255051038.jpg (162 KB, 1024x1024)
162 KB
162 KB JPG
►Recent Highlights from the Previous Thread: >>101567223

--Mistral Large 2 performance and open source models: >>101568726 >>101568758 >>101568821 >>101568864 >>101568759 >>101568762 >>101568793
--Groq Inc. tweet compares Llama 3.1 70B, GPT-4o, and GPT-4o Mini in Street Fighter gameplay: >>101569802 >>101569864 >>101569849
--Multimodal AI capabilities and expectations: >>101568555 >>101568568 >>101568570
--Anon releases mpt-30b-chat q8 GGUF quant with faster inference: >>101567424 >>101567470 >>101567503 >>101567560 >>101567596 >>101567615 >>101567628 >>101567690
--Vram requirements, cpumaxxing, and GPU acquisition strategies: >>101567467 >>101567667 >>101567716 >>101567788 >>101567818 >>101567891 >>101568000 >>101568043 >>101567882 >>101567905 >>101567921
--Sam Altman's opinion piece on AI's future and OpenAI's challenges: >>101570031 >>101570067
--Q3_K vs Q3_K_L: >>101568052 >>101568076 >>101568118
--Legal consequences of AI-generated CP and privacy concerns with OpenAI: >>101568949 >>101568977 >>101569002 >>101569034 >>101569056 >>101569217 >>101569239 >>101569529
--Gemma 2 9b and model reviews and recommendations: >>101569685 >>101569762 >>101569781 >>101569794 >>101569757 >>101569786
--GPUs and RAM for Mistral Large: >>101569885 >>101569896
--Factors influencing VRAM amounts on consumer GPUs: >>101568585 >>101568660 >>101568718 >>101568704 >>101569337 >>101569360 >>101569469 >>101569518 >>101570192 >>101569599
--Best NSFW model for 12GB VRAM: >>101568547 >>101568552 >>101568564
--Affording super AI cards: >>101568355 >>101568376 >>101568377 >>101568407 >>101568426 >>101568575 >>101568399
--Mistral NeMo 12B sampler settings and instruction following: >>101570059
--Mistral Large preset: >>101567703
--Anon shares a potential fix for Nemo repetition issues: >>101568590
--Miku (free space): >>101569616 >>101570324 >>101571277

►Recent Highlight Posts from the Previous Thread: >>101567235
>>
waiting for cohere
>>
waiting for agi
>>
using largestral
>>
>no major model drop today
It's so over
>>
using kobold and getting my nuts slobbered by waifus (i have numerous)
>>
>>101571373
>literally each entry is both somehow bloated with irrelevant replies and missing replies in a reply chain
Grim.
>>
>>101571408
They didn't release it today to one-up Meta and Mistral. What if it's not as good as we hoped?
>>
>>101571430
surely that means we'll get two tormoow
>>
Can mistral large be merged with COPY? I downloaded q5 and merged it with llamacpp gguf split properly and it worked and all that, but then i wanted to try q4 to compare the speeds, and for some reason it wont merge properly, tried 5 times, output file comes out smaller than parts and wont launch
inb4 why merge, I wanted to use it in kcpp for convenience
>>
>>101571441
you can use split files in koboldcpp
>>
>>101571440
Have they ever released on a Friday?
>>
>>101571455
Groq was released on a Friday I think.
>>
File: file.png (69 KB, 345x154)
69 KB
69 KB PNG
>constant instability during training
Guess I'll update...
Wish me luck........
>>
>>101571494
I thought we were talking about cohere
>>
>>101571508
Oh well, I thought you were talking in general
>>
Yeah, that's it for this week but there'll be three big releases next week. One of them will be by a very surprising source.
>>
>>101571531
Applebros, we are going to be so very back!!
>>
Largestral... 0.4t/s... Comfy...
>>
>>101571531
that's bullshit but i believe you anyway
>>
File: file.png (148 KB, 864x147)
148 KB
148 KB PNG
>>101571504
NOOOOOOOOOOOOO
>>
>>101571531
amazon...
>>
>>101571531
I don't think that's bullshit, but I'm not believing it.
>>
does nvlink speed up inference when using tensor parallelism? or is there still not much data being transferred between cards?
>>
I just got xtts up and running. Are there any archives or repositories for voices samples, like Chub?
>>
>>101571531
I don't think I would be surprised by a PornHub LLM.
>>
>>101571658
NVLink should help quite a lot with tensor parallelism.
I have never built a system with it myself but I've received a user report saying it makes a large difference.
>>
>>101571531
NovelAI...
>>
What speed are people getting 4x3090 and Mistral Large?
>>
You'll be able to film movie skits that look real using video
>>
>>101571698
Why not make your own? You need just a few seconds/minutes, right?
>>
>>101571747
I mean yeah but I'd love to just have a convenient library of any character imaginable like Chub does.
>>
Does anyone have an estimate of when Llama will be made available with vision features? I estimate in half a year maybe?
>>
>>101571704
How large we talkin'?
>>
>>101571791
Are you in a hurry? Let's say by Nov 21st... yeah... that sounds right...
>>
>>101571704
thanks for the input. i have an a6000 and two 3090s and am considering replacing one of them with another a6000 since i can get it locally for cheap-ish. figure if there's any possibility of it speeding up inference for models that fit within the two a6000s, i may as well pick up a bridge for another $200.
>>101571738
with 1x a6000 + 2x 3090 for the same amount of VRAM, on mistral-large-instruct-2407 5bpw on exllama2 i get:
>Metrics: 264 tokens generated in 37.01 seconds (Queue: 0.0 s, Process: 19 cached tokens and 4597 new tokens at 543.18 T/s, Generate: 9.25 T/s, Context: 4616 tokens)
>Metrics: 242 tokens generated in 95.18 seconds (Queue: 0.0 s, Process: 1178 cached tokens and 30830 new tokens at 508.06 T/s, Generate: 7.02 T/s, Context: 32008 tokens)
have not tried other formats with this model yet. i imagine 4x 3090 would be similar-ish in speed, since my understanding is that while the a6000 is a little slower (lower memory clocks/bandwidth), splitting layers across a fourth 3090 might introduce more overhead.
>>
>>101571822
I don't remember the specific numbers (and it was months ago anyways) but (for llama.cpp with --split-mode row) it was basically the difference between effectively unusable and faster than a single GPU.
>>
>>101571831
I wanted to buy extra RAM just for the new llama models but now that they don't have my favorite feature I'm retiring the plan.
>>
3.1 70B seems retarded. Like way dumber than 3.0 of the same quant. I'm using exl2 so it's not a llamacpp issue. But maybe exl2 inference is broken as well? dunno.
If not this model is shit and a big step down in intelligence from the previous version, regardless of what benchmarks say. Thank god for Mistral I guess.
>>
>>101571884
>If not this model is shit and a big step down in intelligence from the previous version,
That's simply not true, you just have false expectations.
>>
>>101571878
May as well get the ram anyway. Even if it's not that useful now, it will in the future.
>>
>>101562692
Does that really work with ST? I'm trying to get it connected but it just not connecting.
>>
largestral has that llama1 vibe
>>
>>101571911
In the future it will be obsolete. Imaging stacking up P40s last year and now they're relics. Better off saving up and purchasing whatever the cheap option is when you need it.
>>
File: mini-magnum.png (526 KB, 1024x512)
526 KB
526 KB PNG
So? How slop is it?
>>
>>101571951
Explain every part of this meme
>>
File: f45.png (152 KB, 559x556)
152 KB
152 KB PNG
>>101571951
>A new general purpose instruction dataset by kalomaze was added
>>
>>101571950
We're talking about ram, not gpus. If you need new gpus in the future, whatever you spent on ram would be negligible.
>>
>>101571944
is this a good thing? it doesn't sound like a good thing...
>>
>>101571951
Surprisingly good, for a small model of course. It's still 12b so don't expect any fireworks here.
>>
i have i7 4790k and rtx 3060, what should i upgrade to run bigger llms? i am sick of running small models
>>
File: Capture.jpg (38 KB, 748x775)
38 KB
38 KB JPG
Are all finetunes the same?
>>
>>101571951
It's alright. It's extremely fast while still being fairly coherent
Unlike Gemma 2 it's not broken, but that might just be the quant I downloaded
>>
>>101572108
>Gemma 2 it's not broken
Not broken gemma 2 when?
>>
File: 1721945184604416.webm (214 KB, 576x1024)
214 KB
214 KB WEBM
>fell for the 64GB of VRAM meme
>can't run largestral without lobotomizing it in perplexity or throughput
>>
>>101572125
time for another 3090
>>
>>101572129
two more 3090s
>>
>>101572101
saochads how did we lose to fucking drummer
>>
TM3090s.
>>
>>101572028
Same thing applies. No sense in stocking up on DDR4 now when DRR5 is already out and getting faster.
>>
>>101571531
sad larp but Im still ODing on that hopium
>>
>>101572163
I mean DDR5 isn't backwards compatible so if DDR4 is what your motherboard takes then that's what you have to buy, there's no choice involved
>>
>>101572061
What's better about it over regular nemo-instruct?
>>
>>101572134
nine more can't hurt
>>
>>101572171
Forever locking yourself to sub second token generation speeds. See why this is a bad idea?
You buy DDR4 now, it will always have the same speed and lose value on top of it.
Or you can save your money and purchase DDR5 next year for the same amount. Even if you need a new mobo you come out ahead.
>>
>>101572163
I feel it's the same mentality of the people 2 days after the 4090 released asking if they should buy it or wait for the 5090. If you're still on ddr4, spend that money upgrading to something with ddr5. Or wait until motherboards with ddr6... and then may as well for ddr7 or whatever... if you want to spend only a reasonable amount of money, upgrading ram is an easy choice.
My point is that you should upgrade if you can, in any way you can. If you have enough for a ddr5 setup, do it. If not, you probably won't do it either in 3-4 months. By then everyone will be waiting for ddr6 or whatever new shiny thing is in the making.

>>101572171
Again. If you need another CPUI+mobo for ddr5, whatever you spend on a few sticks of ram will be negligible. You can sell the old pc whole.
>>
>>101572204
It's roughly as smart as Mistral's instruct, but it's smuttier and more Claude-like, more creative. That's an achievement because models often get dumber when smut tuned. This one didn't.
>>
>>101572252
Obviously this only applies if you're stocking up on server ECC ddr5 ram and not the consumer stuff. You're never going to run a >70B model on your 256GB ryzen build at acceptable speeds even with ddr5.
>>
>>101572299
>256GB ryzen
Can I really use 4 sticks of ram? I heard there were issues with it. I have 2x48 at 6000 now.
>>
File: pj.jpg (46 KB, 500x384)
46 KB
46 KB JPG
>>101572101
>no bobby sinclair
garbage
>>
>>101572274
I feel like you missed the part where the feature anon wants isn't out yet. It's one thing to buy and use now versus buying now and sitting on it for a year and just buying then.
>>
the 70b llama 1.3.1 is really good, I am using a 4.5 quant and is the best local model I have used. It is even out performing as a sillytavern chatbot the miqu mixes
>>
>>101571959
[left]: A feline-like creature known as petra (/lmg/'s mascot) confidently strides in front of Alan Turing
>>
File: anna.png (78 KB, 870x240)
78 KB
78 KB PNG
huh, nemo is the first model next to CAI's that I've seen use OOC notes. neat.
>>
>>101572367
bullshit
it's shit, way dumber than 3.0
>>
>>101572386
Yeah, why can't they release an 8x12b nemo? Then I could finally replace wizard 8x22b.
>>
>>101572337
Then what difference does it make *when* it will release. What matters then is *what* it needs to be usable. If the point of saving money is to buy an extra gpu later, buy the gpu later. If the point of saving money is to upgrade to cpumaxx, cpumaxx later. I guess i'm just dumbfounded by poorly worded, round-about questions.
>What do you guys think the requirements for llama-vision are going to be?
is a better question. But even then, I'd object to the usefulness of the question when nobody can now, baring some leak or whatever.
>>
File: 11__00875_.png (1.99 MB, 1024x1024)
1.99 MB
1.99 MB PNG
>>101572386
It responds really well to OOC: instructions too.
But yeah first local model to do it unprompted to me, it complained that my replies weren't detailed enough.
>>
>>101572448
>it complained that my replies weren't detailed enough.
sovl...
>>
>>101571586
The first LLM trained exclusively on fake pajeet product reviews.
>>
>>101571842
I think I was getting slightly below that on 4x3090. So yeah it's a pretty good comparison.
>>
Wow Erebus sucks ass
>>
>>101572502
why are you using erebus in the year of our lord 2024
>>
>>101571531
Looking forward to seeing the first open-weight LLM release in years from OpenAI.
>>
>123B
>405B
How am I supposed to use these huge models (locally) in a cost-effective way?
>>
>>101571531
come on leakers are more anonymous here, you can be a little more detailed than jimmyapples/flowersfromthefuture
>>
Nemo 12B is unironically smarter than L3 70B 3.1
>>
>>101572516
>Cost effective
You don't.
>>
>>101572510
I googled sexo models and it was the first result
>>
>>101572502
It's hard to believe how far we've come since I tried Erebus back in 2022 after c.ai got lobotomized for the first time. I wrote off local models as an option at the time.
>>
>>101572536
This is some bullshit. How much for a server with 6 TB of RAM? GPUs are clearly out of the question.
>>
File: 1720802926533214.jpg (11 KB, 225x225)
11 KB
11 KB JPG
>>101572516
>cost-effective way
>>
>>101572528
Yeah but the old llama 3 70B is smarter than nemotron. So meta fucked something up there. Also 3.1 8B has a better conceptual understanding than Nemo. I'd still more likely use Nemo for rp though because 8b is slopped to hell
>>
>>101572557
who is that? it's jensen's clothes but the face is clearly a different guy
>>
>>101572560
Maybe the distillation process is flawed?
>>
>>101572557
I feel like the more I buy, the more cost-effective it becomes.
>>
>>101572552
I did some testing on q4xs 405B on my home rig, which has 8xDDR4 2666 and it took nearly an hour to generate a 512 token reply.
Like I said. You don't.
Either rent cloud for it or someone probably has some kind of proxy setup that exploits the hf chat endpoint.
>>
>>101572560
>Yeah but the old llama 3 70B is smarter than nemotron. So meta fucked something up there.
I've been saying this but this thread is being belligerent about it for some reason
405B is good but the smaller 3.1 models have something badly wrong with them, they're all significantly dumber than the previous version
>>
>>101572589
>8xDDR4 2666
What kind of speed is that? I think the CPU guide has DDR5 at about 800 GB/s or something.
>>
>>101571531
you can say this for every week this year, between all the actual releases, copypaste chinese followups, placebo finetunes/merges, and cope papers with halfbaked githubs attached that faggots get excited about for one day
>>>(MIKU SYSTEM NOTE: THIS RECENT HIGHLIGHT THREAD SUMMARY WILL BE IN ALL CAPS.)
>>
>>101572591
8B is definitely smarter but more slopped than the old one. But it doesn't respond well to additional fine-tuning. 70B is just lol.
>>
>>101572641
Here's Teknium (the guy who makes the hermes models/dataset) saying 8B 3.1 is doing worse for him on every measure
he has other posts about it too
https://twitter.com/Teknium1/status/1816514230595784969
>>
I apologize for not searching well enough but do you guys know what's the best "chinese" model around 7-10b?
I'm trying to learn it and it would be really helpful to have a few cards to help me out with it (coherence, pinyin, meaning, etc).
I know about Qwen2, Deepseek v2 lite and this finetune:https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat
But I'm not capable enough to know which one to pick or if there are better alternatives.
>>
>>101572733
Try Yi and GLM.
>>
Anyone know how to get xtts streaming working in ST? It appears as if ST waits for the entire gen to finish, then sends that off to the xtts server, and then the server sends over the result whole. When I select the streaming option in ST, I just get silence. And when I use the streaming flag in xtts, it gives me an error saying something about invalid sample rate.
>>
>>101572634
>THREAD SUMMARY WILL BE IN ALL CAPS
And in Russian.
>>
>>101572746
I suppose that asking a definitive answer is a pretty tough ask given how niche my question is + how muddy the meaning of "best" is in this context.
Well, I'll give all of them a try then, thank you for the suggestion anon.
>>
llama 3.1 70b felt smarter than base llama3 for me but also infinitely worse when it comes to SHIVERS and the like.
>>
https://x.com/InfernoOmni/status/1816492686087508174
>guys this is actually INSANE. a former employee of a multi-billion dollar company, Runway, confirmed that they mass downloaded YouTube videos in order to feed their AI. there's a spreadsheet with NOTES showing HOW they swiped videos. Nintendo was on the list.
whats going to happen to them bros??
>>
>>101572829
Time to mysteriously disappear from the face of the earth.
>>
>>101572829
Nothing. We knew about this years ago. Google were doing it all along.
>>
>>101572844
I wish I was a sociopath CEO so I had the balls to do stuff like this myself
>>
>>101572589
>an hour to generate a 512 token reply
>0.14 token/sec
That might be usable depending on the quality and what I need from it. How much did that system cost?
>>
>>101572847
>Google mass downloaded YouTube videos in order to feed their AI
Oh they're in for it now.
>>
>>101572829
WTF this is bullshit none of those youtube authors consented to having their content watched and learned from without even paying them for it
lawsuit time!!
>>
>>101572829
Despite all the confident talk there STILL hasn't actually been a legal test case to determine and set precedent on the question of whether showing copyrighted media to a transformer or diffusion ML model counts as copyright infringement.

Both sides of the issue talk about it as if the question's been settled in their favour, but they both know it hasn't been. That's just what you do when a legal question is still open, pretend it's closed in order to appear confident.
>>
>>101572875
The videos literally belong to them.
>>
>>101572876
I consented. I have a youtube channel with two videos from 10 years ago with 300 views
>>
>>101572876
Now that I think about it, is using someone's youtube videos even not allowed in any way? Google says they're copyrighted.
>>
>>101572890
>upload video to youtube (for free)
>youtube creates subtitles for you
>download subtitles and use that for your dataset
The perfect crime.
>>
>>101572885
I mean, copyright probably shouldn't exist. Japan allowed it i think. In a civilized world it would be allowed
>>
>>101572904
Im sure if google is using them then somewhere in the terms of service they agreed to is a agreement that they can use the videos anyway they want.
>>
>>101572890
Not my videos. I put a disclaimer in the description of all my videos that I do not consent to giving ownership to YouTube. They are mearly a hosting provider.
>>
>>101572929
I mean using someone else's videos
>>
>>101572944
Again, I'm sure part of the TOS is that google can use the videos anyway they want, that includes selling rights for other companies to train AI off of them. Nothing is free, people just don't bother reading terms of service and don't realize that they are the product.
>>
been away for a bit
llama 3.0 70b q4 -> 3.1, is it worth the upgrade at all? can't be bothered to redownload that much if the upgrade is marginal
also the new mistral large, any means to run it on a 3090 & 64 GB RAM and is it better than llama3?
>>
>>101572977
There's some question whether it's better as some people are getting bad benchmark scores from it. May or may not be loader bugs.
However it is not made for RP. Mistral Large does RP. It can be loaded with a Q3 quant if you want, but it will not be fast. You will probably get like 1 t/s or less.
>>
File: tos.png (36 KB, 788x378)
36 KB
36 KB PNG
>>101572936
Not into reading ToS, i suppose. You don't need a disclaimer for that, but they can still do pretty much anything with them. Specially serving them unless you private them.
>By providing Content to the Service, you grant to YouTube a worldwide, non-exclusive, royalty-free, transferable, sublicensable licence to use that Content (including to reproduce, distribute, modify, display and perform it) for the purpose of operating, promoting, and improving the Service.
>>
>>101573016
meh, i already get slow ~1T/s speeds with llama3-70b, as long as the results are good and the model doesn't need tardwrangling too much i can be patient
i'll try mistral out then, thanks anon
>>
>>101572977
Been running it at Q2_K on 64 GB RAM, it's less retarded than I expected and I get about the same speed as a 70B at Q5. I think it might be my favorite RP model and replace CR+ for me.
>>
Anyone with 48gb running Mistral Large EXL2? Which quant and settings? I can't get this fucker to load with more than 3584 context size.
>>
>>101572970
I just didn't (don't) get what will happen to me if I use someone else's copyrighted video content
>>
>>101573059
Made with imatrix?
>>
I wish my eyes had a gleam in them...
>>
>>101573071
kek
>>
>>101573071
lmao
>>
>>101573068
Didn't check, quant from https://huggingface.co/MaziyarPanahi/Mistral-Large-Instruct-2407-GGUF
>>
>>101573060
3584 is all you need
>>
>>101573060
Have you considered being less poor?
>>
>>101571366
What's the best local model for ERP with 8GB VRAM?
>>
>>101573107
Snowflake Arctic Instruct
>>
>>101573059
Also something weird, it seems to be extremely deterministic, not sure if it's because of the quant. I'm using 4 temp and 0.03 minP to get some variety between rerolls.
>>
>>101572298
its actually too horny imo and a lot dumber in certain situations.
>>
>>101573107
With 8GB you're better off using ram as well.
>>
>>101573157
Won't replies be really really slow?
>>
>>101573167
Not if you don't offload too many layers to RAM
Nemo 12GB at Q6 should be quite usable in llamacpp with partial cpu offload
>>
>>101573215 (me)
oops *12B, not 12GB
>>
>Karpathy and other niggas been shitposting about tokenization in Transformers, spamming that "is 9.11 > 9.9" meme like it's the fucking "arrow to the knee" of AI
>Llama 3 rolls out with some new tokenization shit, claiming better text compression but makes L3-405b parse Markdown like a down syndrome kid
>FAIR was already on this shit a month ago with multi-token prediction in Meta Chameleon, but Llama 3 paper doesn't even acknowledge it
Are these AI labs just circlejerking about scaling and dataset quality instead of actually fixing tokenization?
>>
whatever happened to chameleon? has anyone written about using it for anything?
>>
>>101573356
Tokenization is not a fixable problem. Different problems require different tokenization. We cannot have one to rule them all.
>>
https://www.reddit.com/r/LocalLLaMA/comments/1ebz4rt/gpt_4o_mini_size_about_8b/
>gpt-4o mini is 8b
foss j33ts lost.
>>
>>101573501
Go back
>>
File: 1719850934570705.jpg (566 KB, 1792x2304)
566 KB
566 KB JPG
>>101571366
>>
>>101573501
brain dead redditard
Go back
>>
>>101573501
> 8b
The journo just made that shit up.
>>
>>101573060
I suggest getting 2 more GPUs.
>>
What kind of videos would you gen now if you could?
>>
>>101573675
Full anime episodes from story descriptions.
>>
>>101573675
I'd use AI ti rewrite Eragon to not be shit (no turning into elf, fucks the dragon) and use that script to make a whole damn series.
>>
>>101573356
I have a possibly too simplistic thought that a model figuring out that straberry has 3r's in it on its own is a nice intelligence milestone. It knows what alphabet is so it should just apply that knowledge. I would also be happy with LLM saying that it actually doesn't know cause it uses tokens instead of letters. But we probably will never see a next token predictor get to this level.
>>
Would Mistral Large with the lowest 1 bit quant still be good compared to Nemo?
>>
I like Nemo. Wish it was smarter. Frogs make a moe out of this please.
>>
>>101573772
No
>>
>>101573784
But, the filesize is still larger. 25 GB vs 12 GB (Q8).
>>
>>101573772
>bitnet
Only thing that would be better is a miqu bitnet.
>>
>>101573799
Oh, no, I'm talking about the IQ quants. 1 bit was probably the wrong wording.
>>
>>101573799
1 bit quant != ternary != bitnet
>>
>>101573772
why don't you try it out
>>
>can run CommandR
>can't run Mistral Large
AAAAAAAAAAAAAAAAAAAA
>>
>>101573832
Cohere will save us next week, believe it!
>>
File: file.png (31 KB, 1284x627)
31 KB
31 KB PNG
>>101573811
Rest of the range is debatable.
>>
>>101573829
Ok fine. Downloading.
yolo
>>
>>101572448
It swiftly becomes obnoxious after the first OOC. The model takes the online RP forum schema too seriously, and because mistral does not specify roles in its prompt formatting, it begins criticizing both the user and itself over and over again. I had to add 'OOC' to the stopping string list.
>>
>>101573832
>can probably run it at exl2-4.0bpw, but with only 2GB free
No... it's over...
>>
>>101573832
Just stop being a ramlet
>>
What preset should I use with mistral nemo?
>>
>>101573167
Define slow? If you're happy with 1-8T/s depending on model size then you should be fine.
>>
File: 0.webm (3.74 MB, 1280x720)
3.74 MB
3.74 MB WEBM
>>101574063
>What preset should I use with mistral nemo?
have you tried the vg/aicg anon's preset for mistral large? >>101567703
>>
>>101574140
I'm using the shitter vramlet version
>>
>>101574158
your shittier vramlet version is hornier, which is a good thing
you can use presets that are designed for different models. you don't really lose too much other than a bit of context applying a jailbreak to a model that doesn't need it
>>
What's the best uncensored model currently? NAI doesn't cut it anymore, i also have a 4090 so i could do local if i have to
>>
>>101574239
>What's the best uncensored model currently?
claude 3 opus or sonnet with a prefill/jb from /aicg

>>101574239
>i could do local if i have to
a single 4090 doesn't cut it anymore, but you can try out a gemma-27b or a mistral-nemo model
but if you're gonna use claude you won't be able to handle local's retardation
>>
>>101574263
how do i use claude uncensored? Can i just sub on the site or is there a diff version somewhere? Can it handle lolis?
>>
>>101574239
The question is not "what is the best uncensored model", it is "what is the best model", and for your hardware that's this one:
https://huggingface.co/bartowski/gemma-2-27b-it-GGUF/blob/main/gemma-2-27b-it-Q4_K_M.gguf
>>101574063
The template in its tokenizer_config.json.
>>101573832
You can certainly run the model below, and it's a good one.
https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-Q4_K_S.gguf
>>101573781
Good prose, poor intelligence and world knowledge.
>>101573772
Likely not. Models degrade too much below 2-bit no matter how good the quantization algorithm.
>>101573675
Pornography of the same type that you can find on xvideos. And I would feel like a vermin about it.
>>101573501
Redditors are retarded and can't read.
>>
>>101574330
Gemma is dogshit
>>
>>101574341
You just don't know how to use it with whatever bastardized prompt you are giving it.
>>
>>101574283
>how do i use claude uncensored?
openrouter if you want to pay, or use a proxy from /aicg/

>Can it handle lolis?
its the smartest and most creative model out right now. it still doesn't make children act realistically but if you like the idea of horny brat children its fine
>>
>>101574349
>SARRR YOU ARE REDEEMING IT THE WRONG SARR
>>
>>101574349
NTA but anons always say that then never provide their prompt. A model that needs some magical mystery prompt to be good (?) is no use.
>>
>>101574330
>The template in its tokenizer_config.json.
Presets can also mean sampler settings. I *wish* those were in models. Or at least recommended starting points.
>>
>>101572386
Theme pls
>>
>>101574439
They recommended temp at 0.3 or 0.4. They cannot recommend other settings for every inference program out there. It's not reasonable and nobody would agree with them if they were included.
>>
>>101572448
are you telling me that "ahh...ahh...mistress!" isn't good enough? preposterous
>>
>write author's note saying character has k-cup breasts
>tease bot about her huge breasts
>they say they are not that big, only a K-cup, not even C-cup yet
So this is the power of AI roleplay...
>>
Anyone using nemo on sillytavern? What context and instruct fo you use? Just Mustral?
>>
>>101574566
yes
>>
>>101574330
>Pornography of the same type that you can find on xvideos.
Except it could be any anime character you want
>>
where can i generate porn videos?
>>
>As for the uptime of your computer, I'm pleased to inform you it's been running for a lovely 6 days, 4 hours, and 44 minutes. You know what they say: "Idle hands are the devil's playthings," but in this case, an idle computer is merely a testament to its owner's questionable life choices.
kek i'm enjoying this discovery too much, it even knew that my load average was low.
>>
File: 0.webm (1.78 MB, 1280x720)
1.78 MB
1.78 MB WEBM
>>101574655
>where can i generate porn videos?
nowhere good, but klingai.com is your best bet for free text2video of any kind
>>
>>101574655
In like a few months. Or maybe a year?
>>
>>101574675
Quit posting these goblins
>>
>>101574695
ill get bored in 2 weeks or when they tighten the filter as a result of my degeneracy, whichever comes sooner
>>
I put on my robe and wizard hat



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.