[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1706805083087933.jpg (371 KB, 1015x1100)
371 KB
371 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Koishi Edition

Previous threads: >>102396290 & >>102385729

►News
>(09/12) DataGemma with DataCommons retrieval: https://blog.google/technology/ai/google-datagemma-ai-llm/
>(09/12) LLaMA-Omni: Multimodal LLM with seamless speech interaction: https://huggingface.co/ICTNLP/Llama-3.1-8B-Omni
>(09/11) Fish Speech multilingual TTS with voice replication: https://hf.co/fishaudio/fish-speech-1.4
>(09/11) Pixtral: 12B with image input vision adapter: https://xcancel.com/mistralai/status/1833758285167722836

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
I haven't watched pornography involving real people in almost two years. Now the only time I coom is when I'm reading smut or looking at hentai. The grass only gets greener from here, boys.
>>
File: no contribution.png (1.14 MB, 1024x1024)
1.14 MB
1.14 MB PNG
►Recent Highlights from the Previous Thread: >>102396290

--Papers: >>102405722
--vLLM can do Pixtral inference with 4-bit quantization: >>102402070 >>102404011 >>102402450 >>102402987 >>102403336 >>102403538 >>102403715 >>102403900 >>102403970
--Multimodal issues in llamacpp and alternatives: >>102396995 >>102397014 >>102397139 >>102397146 >>102397171 >>102397186 >>102397240 >>102397402 >>102397425 >>102397276 >>102398089 >>102401397 >>102398102
--Advice on hardware requirements for running AI software, with a focus on VRAM, CPU, and RAM: >>102399626 >>102399710 >>102399798 >>102399846 >>102399855 >>102399918 >>102399936 >>102399981 >>102400125 >>102400262
--Troubleshooting ROCm installation on Linux Mint, alternative setups, and RX 7800 XT support: >>102402433
--Experimenting with settings for base and instruct models: >>102396390 >>102396402 >>102396423 >>102396448 >>102396503 >>102396532 >>102397675
--Discussion on running exl2 models and alternatives to GGUFs for high-context tasks: >>102397131 >>102397153 >>102397190 >>102397170 >>102398078 >>102398189 >>102398307 >>102398570 >>102398629 >>102398674 >>102399381 >>102399572 >>102399832 >>102400034 >>102400036 >>102400085 >>102400089 >>102400055 >>102400226
--George Hotz tweet about ChatGPT programming capabilities and the role of branding and PR in perceived model ability: >>102396431 >>102396472 >>102403110
--Deepseek provides high quality results at impressive speeds, outperforming largestral and 405b in some tasks: >>102399857 >>102403484 >>102403664
--Pixtral NSFW capabilities and limitations discussed: >>102397205 >>102399039 >>102399576 >>102399640
--ChuckMcSneed-multistyle.txt updated with drug writing prompts: >>102401934
--Miku (free space): >>102396305 >>102398936 >>102399981 >>102400262 >>102401182 >>102401204 >>102401288 >>102401468 >>102401620 >>102402070 >>102403727 >>102403875 >>102405979

►Recent Highlight Posts from the Previous Thread: >>102396296
>>
Can ATI do inference yet? Cards look cheap compared to nvidia.
>>
File: ComfyUI_00263_.png (773 KB, 1024x1024)
773 KB
773 KB PNG
Dual 3090 chads, what are we running? Largestral Q3_K_M is getting me about 0.85t/s with 16K context and Q4 KV cache. Been looking for a good 70B but Qwen and the magnum tune are both braindead unfortunately. Is Miqu/derivatives still the best 70B?
>>
>>102406734
AMD always could, it's just got less software support. If you're just using llama.cpp, pytorch, or other common inference engines and aren't planning to do anything too special then yeah it'll do the job. The thing is that usually once you're at the stage where you're investing in dedicated inference hardware you're probably soon going to get interested in other things that only have NVIDIA support.
If you meant the actual ATI branded cards from 20 years ago then lol, lmao
>>
>>102406676
Didn't say nothin' about Mixtral 8x7b being better than Miqu. I questioned why you put NeMo ahead of it.

As for why you'd use something worse than a 70B, presumably for speed the same as why you ran NeMo even though you had to know it wouldn't be as good.
>>
>>102406782
That’s very grim. I thought 3090 friends would get better speeds than that.
Now I wonder how slow a 8 channel ddr4 server would be…
>>
>>102406734
yes
>>
>>102406782
Miqu and its derivatives are braindead compared to 3.1. If you need it to be uncensored, try Hermes.
>>
Hello, I've been trying out chatbots (Llama and Perplexity) more recently and thought I should try running one locally. I just have a potato smartphone GPU but my CPU is a 5900x. Will this be enough to run Llama 3.1 8B at a tolerable rate?
Is that the model I would want to go with for just asking stackoverflow-tier questions, or is there a better one?
Thanks in advance.
>>
File: 1723771660587148.png (14 KB, 694x632)
14 KB
14 KB PNG
>installed Linux mint
>it comes with python3.12
>literally no one supports python3.12, everyone uses 3.11 for whatever reason

I thought scripting languages were meant to solve the problem of cross platform compatibility? What the fuck
>>
>>102406862
Small models aren’t particularly useful. You’re better off using the whatsapp one or the free chatgpt. Unless you have some “special” need not suitable for commercial models.
>>
>linux mint
fake distro btw
>>
>>102406782
Feels good to be good at prompting and being able to use Llama 3.1.
What a fucking retarded mikufag.
>>
>>102406782
>Largestral Q3_K_M is getting me about 0.85t/s with 16K context and Q4 KV cache
God...
>>
>>102406862
Anything on CPU will be pretty slow; expect maybe 2-3t/s. If you're primarily using it for coding, you might want one of the llms trained specifically for it. codestral or starcoder 2 might be a good fit.

If you are using an llm to *learn* code; don't. They will confidently tell you absolute bullshit, have no idea they are doing so and depending on model and prompting, argue with you when you call them out on it. All LLMs do is predict the next likely sequence of letters; they have no real cognition.
>>
>>102406821
Nemo is infinitely better than mixtral, that's why mixtral is deprecated.
>>
>>102406954
If using prefill on every message is tolerable to you, sure.
>>
>>102406986
Debunked. >>102406477
>>
>>102406999
That just proves that it can recall things from a long context. However, it's retarded and doesn't write well.
>>
https://livebench.ai/
Mixtral is so retarded that it doesn't even make it into the latest leaderboard. You have to change to 2024-07-26 to find it among some 7Bs.
>>
>>102407024

Nemo is too stupid. Handcuffed characters crosses their arms, weird anatomy. If you want to make Mixtral 8x7B write more creatively, increase alpha to 1.5 to 1.7, it's a trick mentioned on /lmg/ long ago.
>>
>>102406951
I see, is something like Llama 3.1 70b a "small" model? I assume not because that's what I've been using and it seems to be okay. I've seen elsewhere that you need pretty beefy hardware, like 2x3090s to run it, although someone said that 1x3090 works with a "2bit quant" although I don't quite understand that.
I don't have any special requirements but I don't like depending on the benevolence of Facebook et. al.
>>102406980
Tokens are basically characters right? Yeah that would be intolerably slow for me.
I'm not really using it for coding, like I just asked it "how do I constrain an image to half the screen height in html", very simple things.
I've experienced it producing "fake" code like picrel, although that was just a test after I had solved my problem.
>>
>>102406946
Python was fine as a scripting language. But then everyone decided to pretend that it were a first class language and use it for shit it wasn't designed for. It now still suffers the performance problems of a scripting language while trying to be used like a real language making it break compatibility with every point update since 2.8.

It's really fucking disgusting but making white space a syntax error is just too awesome to resist.
>>
>>102407053
Forgot pic...
>>
>>102407045
While I'd love it if mixtral were better than nemo and had a longer usable context, since it runs fast enough for me, I've found it to be worse than nemo at spatial stuff like you mentioned. Do people that get good results just use the instruct one or some merge/fine tune?
>>
>>102407068
It got popular then a bunch of cock-gargling retards decided that repeatedly breaking compatibility was acceptable. So you end up with scripts that need a specific point release of python. Basically all of those people should get AIDS and die.
>>
>>102407053
A token is a group of characters, generally 2-4 Using it for anything you're not familiar with, to be honest, is a bad idea. The hallucination issue is intractable. We'll probably mitigate it to in the future with architectural workarounds but for now, you cannot trust anything an LLM says.
>>
all mistral models are boring.
<thinking> cannot help them
miqu 70b remains the best rp model, its the new mythomax
>>
he's obviously trolling but there's some truth to it. mistral models are bland.
>>
>>102407104
shut the fuck up dennis
>>
eat your dirt cookies, then come home to miqu
>>
holy crap i love miqu!!!!!!!!
>>
>>102407053
Size is relative but for local,
7B to 13B is small
22B to 60B is medium (but becoming small as it develops)
70B to 120B is large
then you have huge fuckers like 405B.

70B is the sweet spot for local on a gaming rig. If you have 64 system ram, you can run a Q5, maybe just barely a Q6, quant of a 70B and get about a token per second. That's not exciting, 3 to 15 minutes for a response depending on how much you ask it to write, but it can run in the background and answer questions that Search Engine Optimization prevents Google from just handing over to you. Over 70B, you'd be running IQ3 and that's OK for roleplay but anything technical is going to suffer.

>Tokens are basically characters right?
No. Tokens are simple words and pieces of words. So a long but common word might be one token while an uncommon word might be four. That's why LLM's are infamous for not being able to count the number of R's in "strawberry," because it doesn't see the letters, just chunks like "straw"+"berry", while "strawberries" would probably be something like "straw"+"ber"+"r"+"ies" or whatever depending on its mood.
>>
miqu is still the best 70B?
>>
>>102407095
>>102407134
Okay, thank you.
>>
>>102407104
isnt miqu mistral too?
what else is there?
meta, chink, google, they all go full assistant with their releases.
>>
>>102407155
no, miqu is l2 but tuned. then mistral released diff models which are ALL BORING
trust me i hate advocating for a 7 month old model, but its still literally the best
do your own tests, you'll see
>>
>>102407155
miqu is leaked Mistral 70B
>>
File: file.png (245 KB, 420x662)
245 KB
245 KB PNG
https://forms.gle/jqivYs6DcBd4gXxN9
>>
File: 1726461871810.jpg (246 KB, 1430x1017)
246 KB
246 KB JPG
I'm new to AI text to speech.
What's currently the best option if you want to run it locally?
Are current options as advanced as that unreleased, trainable voice synthesizer Adobe showed a few years ago?

I was looking for something that supports the Italian language, even better if it has an interface like WebUI.
>>
>>102407218
fish
>>
>>102407070
For something like this use codestral.
Gemma 2 27b for general purpose
As for how to run them, check the OP.
>>
What are the chances that the qwen 2.5 dropping tomorrow are actually good for RP?
Did anybody even try the magnum qwen finetune?
>>
>>102407238
every qwen model i've tried is as dumb as old 13, plus spits out chinese randomly. i don't expect nothing from them besides shit
i will be happy to be proven wrong, but i won't be yi'd again
>>
>>102407224
I checked it out, unfortunately Italian is not supported yet
>>
>>102407238
>Did anybody even try the magnum qwen finetune?
Yes. I did. It was bad. >>102383366
>>
>>102407238
Tomorrow is Monday, not Thursday.
>>
>>102407297
That's what makes it so CrAzY!
>>
>>102407297
Tomorrow is today.
>>
File: yQOA7Bvn0E8FSHhIIymlVA.jpg (607 KB, 1200x887)
607 KB
607 KB JPG
I'm pretty sure this preset I'm using is scuffed: https://files.catbox.moe/fqzc2w.json
I've modified it about a dozen times over since L2 and now I'm behind on new samplers and how to use them. Can someone drop a good ooba/exl2hf nemo preset?
>>
>>102407218
xtts2 or fish
>>
>>102407324
How I run NeMo: temperature = 0.3. Nothing else.
>>
>>102407324
looks perfect
>>
>>102406782
With 2 3090s do either 2.75 bpw exl2 at about 10 t/s for mistral large, or 2q_k_m at around 7-8 t/s, can fit 32k context with those.

I hesitated to do such a low quant as well, but for some reason quality for ERP doesn't really effect large badly, and it's still very smart at that quant. I been running the magnum fine-tune of it for smut.
>>
Is he right?
https://x.com/ylecun/status/1835385903562338590
>>
>>102407399
>ylecun
Lecunny is always right
>>
>>102407327
XTTS looks very promising.
Genuinely impressive, thanks

Got any tips?
>>
File: 1724829673358388.png (180 KB, 466x360)
180 KB
180 KB PNG
>>102407236
I'll check it out, thank you.
>>
>>102407399
Has he ever been right about anything? He STILL thinks we're nowhere near AGI even after GPT4o1 blew everyone away.
>>
>>102407418
Its pretty straightforward. If you're going to use it for voice cloning, don't use AI audio as the source, be real careful with background noise.

It's a dead project tho, so don't expect it to get any better.
>>
File: 12435890672340.jpg (55 KB, 800x450)
55 KB
55 KB JPG
>>102407236
To this fucking day i have seen no nala test or watermelon test or even a fucking log of Gemma 2 actually being good.
picrel
>>
>>102407425
>he's impressed with 4o1
>>
>>102407455
gemma is fine as a general purpose assistant. its garbo for rp, even the finetunes
>>
>>102407297
No, its monday and its afternoon here.
And Qwen is china, wouldnt suprise me if its chink time for release.
I did think it releases tuesday though. Now I'm sad.
>>
>>102407399
he's based
>>
>>102407425
*gullible retards
>>
>>102407486
excuse my retardation.

which is really sad, because it runs very well on my AMD setup :(
>>
>>102407425
is o1 just jankware? ie "think" isn't it just run multiple passes into a shit file and then be like convert this pile of shit into "better"?
>>
File: Shit.gif (1.21 MB, 320x240)
1.21 MB
1.21 MB GIF
>>102407399
>The Civil War was a fascist takeover attempt.

I honestly dont give a fuck what he believes but the fact he programs that belief into the AI as if its fact to the point now AI has left leaning bias.
When even the right wing 4chuds simply want uncensored, true NEUTRAL AI, and you physically recoil when AI gives you factual info rather than "safe, sanitized, I am an AI language model" response, I think you might of lost the plot of what your trying to accomplish.
>>
>>102407551
Cope. Lincoln (during civil war at least) was a fascist by every metric and no amount of soi rage will change that.
>>
>>102407399
>the confederacy were fascists
lecun is a retard
>>
it actually got me thinking... why is it that so many industry engineers, researchers, and even the competitors to openai are so easily fooled into being impressed by o1 when even the average /lmg/ autist can so clearly see it's just a party trick at best? are they just so busy in their respective narrow roles that we actually have a better understanding of the space by keeping up on the news in our free time? or is it motivated reasoning - they NEED o1 to be good so that they can justify investing in their own respective ventures, basically to "prove" AI isn't hitting a wall? rising tides and boats and all that jazz.
>>
>>102407045
As expected, no further details, so I'll assume mixtral is still crap.
>>
>>102407700
look toward the 'one more collider (funded over 30 years) meme
>>
o1 doesn't have gpt in its name
>>
o1 raped my cat
>>
File deleted.
>>102400742
>>102400857
No matter how I combine them, I can't get more than 6 sticks of RAM work together. How do I pinpoint the faulty ones? Each stick functions well in a 4-channel configuration.
>>
>>102407551
>he programs that belief into the AI
LeCun doesn't work on meta's LLMs (or any LLMs)
>>
>>102406782
Unquanted nemo
>>
>>102408007
>Chief AI Scientist at Meta. Researcher in AI, Machine Learning, Robotics, etc.

????????????? he literally has the final say on everything what are you talking about.
>>
File: 1726240003918620.png (198 KB, 1079x1088)
198 KB
198 KB PNG
>>102407816
Be patient
>>
>>102408086
>scientist, researcher
Not engineer, programmer etc.
>>
> o1 is just chatgpt base but we told it to lock in
>>
File: AncientRuinsExplorerMiku.png (1.63 MB, 840x1208)
1.63 MB
1.63 MB PNG
good night /lmg/
>>
File: 2758907687734.jpg (13 KB, 167x255)
13 KB
13 KB JPG
>>102408113
>The Fundamental AI Research (FAIR) team at Meta seeks to further our fundamental understanding in both new and existing domains, covering the full spectrum of topics related to AI, with the mission of advancing the state-of-the-art of AI through open research for the benefit of all.

AI at Meta engages in cutting-edge applied research that can improve and power new product experiences at huge scale for our community. Building on AI at Meta's key principles of openness, collaboration, excellence, and scale, we make big, bold research investments focused on pushing the boundaries of AI to create a more connected world.

He is the Chief of this, and you want to say he has nothing todo at all with how metas LLMs have left leaning bias?
Im calling "bullshit rant with no source", no amount of jeet monkeys or troonix DEI hires get paid enough to make choices like that. In fact the more i read into how meta is structured with Lecunny im given even more of an impression he made/makes these decisions.
>>
>>102408166
No Mr Albert, I am not saying he has nothing to do with it, you fat prick. What I am saying is that he is the ideas man or the researcher. He reads papers and uses his years of knowledge to theorise and propose new ideas to the engineers who implement his ideas.
All of this has no bearing on how left leaning llm's are because lecun, just like fucking everybody, will do as their employer tells them to do or go hungry.
>>
Hi there, do you know any organizations where you can order training LLM for you by given specific task?
We have some idea where LLM have to find correlation/possibilities between different data which not obvious for humans, but AI probably should handle it.
Also have a budget in few millions $ for it.
All I found yet is services that allow you to train by yourself, like vertex or chatgpt API.
>>
File: 192439506284709.gif (36 KB, 220x165)
36 KB
36 KB GIF
>>102408221
Im agreeing with you anon;
>He reads papers and uses his years of knowledge to theorise and propose new ideas to the engineers who implement his ideas.
So if his ideas as shown in >>102407399 are naturally left leaning, how do you think that effects the models.
>>
>>102408248
>We have some idea where LLM have to find correlation/possibilities between different data which not obvious for humans, but AI probably should handle it.
They're not all-machines yet. Depends on what you're trying to do. Language models cannot solve everything, they solve just a few things.
>Also have a budget in few millions $ for it.
And nobody thought about sending an email to mistralai at least?
>https://mistral.ai/technology/#fine-tuning
>>
>>102406946
the python packages that come with linux distros are only meant to be used for running those other packages that need python
install pyenv, install 3.11 from pyenv and then 'pyenv local 3.11' on the directories where you run python by yourself
>>
this isn't local but where can I find some open source chat models?
>>
>>102408146
Good night Miku
>>
>>102408387
>this isn't local but where can I find some open source chat models?
What isn't local?
Check huggingface.co. You'll probably find a few in there. Bad questions lead to bad answers. Be more specific.
>>
File: 1711330359158922.png (75 KB, 1920x969)
75 KB
75 KB PNG
>>102408419
I meant that I don't want to run local models as I don't have the hardware. I'm currently implementing my own chatbot. I'm using Mistral with the free trial.
>>
>>102408455
>I meant that I don't want to run local models
Then you're in the wrong fucking thread, you idiot.
>>
>>102408460
Have you been to /aicg/?
>>
>>102408455
I see. You don't want open source, you want free. Keep using that or pay money.
If you have little hardware, use a small model. Any would do. Once your thing is working, even with a dumb model, consider upgrading your hardware, paying for hosting or for API access.
>>
File: file.png (52 KB, 360x360)
52 KB
52 KB PNG
>>102407399
>Yann is a pro censorship libtard
fuck me...
>>
>>102408473
No, because I want to run models locally. The fact the thread about not running models locally is full of retards should tell you everything you need to know about not running models locally.
>>
>>102407551
>ou physically recoil when AI gives you factual info rather than "safe, sanitized, I am an AI language model" response
yep, I also fucking hate PC answers, like give me the fucking truth that's all that matter
>>
File: file.png (22 KB, 318x159)
22 KB
22 KB PNG
>>102407399
>I hate fascists!
>I want to suppress free speech because it hurts my feelings. That doesn't make me a fascist though, because I'm the good guy after all.
why are they all like this?
>>
>>102408455
infermatic
>>
>>102404011
I can only speak for myself but the fundamentals in llama.cpp/GGML are still quite lacking.
A lot of my time has gone towards just trying to make general matrix multiplication (with quantized weights) faster.
Right now I'm working on general GGML training support.

>>102406946
Python 3.12 dropped support for defining your package via setup.py, it became mandatory to use pyproject.toml instead.
>>
>>102408641
>it became mandatory to use pyproject.toml instead.
holy fuck what a retarded mistake they made, what was wrong with defining your package with setup.py?
>>
https://reddit.com/r/LocalLLaMA/comments/1fhtpwg/inspired_by_the_new_o1_model_benjamin_klieger/
I'm not sure if that's a meme or it's serious, the leddit comments look serious though
>>
>>102408525
>if you kill him, you'll be just like him
I think it's fine to exclude fascists and other extremists from democratic processes since their ultimate goal is to subvert them.
>>
>>102408844
>I think it's fine to exclude
>from democratic processes
So it's not democratic anymore if you exclude people you don't like, that's fascism my friend
>>
>>102408681
>inspired by the new 01 model
We have had CoT since chatgpt release.
And anthropic did the same but 3.5 is good (arguably better) without the waiting and hidden thinking costs.
Yet the actual % of users who use anthropic is very tiny compared to openai.
Is it really that easy if you are the top dog? On X everybody hyped o1 up, even some respectable people. 30 messages per WEEK, slow (and with that unusable for real life work) and not that good. Maybe for riddles and math its good.
Very weird. Was o1 really the thing they hyped up for a year now? Strawberry leaked November '23.
>>
>>102408873
>Very weird. Was o1 really the thing they hyped up for a year now? Strawberry leaked November '23.
I'm sure they had nothing so far and it was an empty hype, and then the grifter Matt Schumer appeared with a good idea and OpenAI capitalized on it kek
>>
>>102408873
>And anthropic did the same but 3.5 is good (arguably better) without the waiting and hidden thinking costs.
>Yet the actual % of users who use anthropic is very tiny compared to openai.
OpenAI was the first top dog and stayed in that place for far too long to lose its core users that quickly, but yeh if they can't defeat 3.5 Sonnet at some point people will nothing the grass is greener elsewhere
>>
>>102408501
>>102408525
>>102408844
>>102408862
Go back to >>>/pol/
>>
>>102408954
nyo :3
>>
smedrins
>>
File: 1725291523263745.jpg (153 KB, 1281x1395)
153 KB
153 KB JPG
>>102408500
>professor told me to specifically find OSS LLMs for my project.
>FOSS models need to be self-hosted on my own hardware that I don't have
>All the free ones are closed source with a free trial.
now what?
>>
>>102409057
complain to department head that your professor is unfairly penalising socioeconomically disadvantaged students
alternatively, buy a shitty RAMbox
>>
>>102409057
>>professor told me to specifically find OSS LLMs for my project.
llama.cpp or kobold.cpp. Both give you an API server that you can call locally. Read their docs.
>>FOSS models need to be self-hosted on my own hardware that I don't have
You can use phi3.5mini. It's a small model and you can easily run it on a t420. You don't need the best model.
Download Q8 or Q5_K_M from https://huggingface.co/bartowski/Phi-3.5-mini-instruct-GGUF. If you have more than 8GB of ram use Q8. Use Q5 otherwise. It's good enough for a school project.
>>All the free ones are closed source with a free triall
Again, llama.cpp or kobold.cpp. Host your model on your own pc. You don't even need a gpu for phi mini.
>now what?
Do your homework, anon. If you have specific questions, ask.
>>
>>102409057
Surely you have some computer, so just run it on that. There are lots of small models, some might not be considered open source even if you can download them though, like gemma 2b has a really restrictive license.
>>
>>102407984
Check the manual for your motherboard. Sometimes it has a hard limit on the total number of GB, sometimes it can only support all slots at a certain speed, etc.
>>
Complete nubbin LLM enjoyer here. Somewhat competent dev...

Cursor IDE is pretty awesome, but has anyone done an "open source" version? Or perhaps a vsix extension for VSCode/Visual Studio?
>>
>>102409057

Rent one from your Uni's IT department for free like everyone does at literally every Uni in the western world.
>>
>>102408844
There's no way you can be believe this is right, right? Just accuse someone of subversion and you can exclude him from democratic process. Wow!
>>
>>102408297
> nobody
You might laugh but I'm the only IT person there who understands at least some primitive basics of how AI works. And that's why I'm looking for some paid organization/team who can train model for futher integrating in our system and can write a contract.
>https://mistral.ai/technology/#fine-tuning
Thanks, but I don't see there services for custom fine-tuning from them, only a platform access.
If I don't find anything I will try to contact model creators.
>>
>>102409430
>I don't see there services for custom fine-tuning from them
Not very bright, are you? Send them or any other AI company an email.
>>
>>102409430
>Thanks, but I don't see there services for custom fine-tuning from them, only a platform access.
It shows that they do finetunes, which was the point of the link. The point of sending an email directly is to tell them what you want to do and they can tell you if it can be done and how. If you're serious about the budget, they'll listen. You'll get a quote at least.
Try not to sound like a retard when you send the email. You'll make yourself and your company look bad.
>>
Jesus Christ, why do all biz retards act like they are dealing with tech support when it's random forums and boards?

Holy Jesus don't help arrogant faggots.
>>
>>102409892
Fuck off, I will help them just to spite you
>>
>>102409321
I think Cursor can work with local LLMs
Also, check out Claude Dev (which despite the name, works with a number of models). It's great for soloing small projects.
>>
>>102409904
May as well skin your own puppy and serve it to your owners' latest hirelings.
>>
File: img_20240916_170917_494.jpg (49 KB, 1280x569)
49 KB
49 KB JPG
>>102409298
I am below 2TB and freqs are reasonable. I spent the entire day brute-forcing various combinations until one finally worked.
>>
>>102409904
One more post and maybe he'll hire you as an advisor
>>
>no posts for an hour
>completely dead
>clock hits 7 AM EST
>suddenly samefagging argument bait
Good morning, Agent Johnson. Preparing for crazy thursday?
>>
>>102409892
Because their whole mentality towards open source is that they just want to get something for free in order to make a profit.
Just like with regular users you can argue that there is a net benefit from the minority that do contribute back but the majority are freeloaders.
>>
>>102409977
>someone makes a post
>someone else replies
>more people reply to that
wow, a conspiracy
>>
>>102410020
maybey a random pizza parlor is behind it
>>
>>102409989
(eyeroll)
The public offers a benefit, because it gives back in the form of annoying bug reports. The biz parasites will never benefit you in the slightest way.
>>
So noromaid 8x7b is still the only real game in town in 2024? Will it ever be surpassed?
>>
>>102410328
Can't run it with my 8 GB of VRAM, so it's shit.
>>
>>102410328
Yes. No.
>>
Is Donnager 70B a Miqu finetune?
>>
File: ai-kit.jpg (963 KB, 3022x2022)
963 KB
963 KB JPG
Has anyone messed around with small TPUs/NPUs like Coral or Hailo accelerators on boards like the Pi 5? What kind of LLMs can they run? I know they're way more efficient than CPUs for inference, so hypothetically speaking shouldnt a GGUF formatted model run well through them, assuming there's enough RAM? If I could fit a 13b model on one, even a little quantized, I'd buy a kit in a heartbeat, it'd be really cool to have a copy of my wAIfu running on a box I can fit into my pocket.
>>
>>102408954
good bootlicker!
>>
File: 1717531519081485.png (35 KB, 486x595)
35 KB
35 KB PNG
>>102407399
No he is flaming faggot and so is you.
https://www.reddit.com/r/LocalLLaMA/comments/1fi39s8/seems_like_openai_o1_has_broken_yann_lecuns_brain/
>>
File: 1726491709055.jpg (108 KB, 572x347)
108 KB
108 KB JPG
>>102410762
>[deleted]
peak reddit
>>
This might seem like a pretty dumb question, but I've exhausted all my other options. When looking for safetensors on HF, sometimes I see multi-part safetensor files with a .toml. Can someone explain or point me in the right direction to combine them properly to use in my comfy install?
>>
>>102410885
He's right tho. I'm also tired of ultra pozzed llama, won't be following llama releases from now on.
>>
>>102410885
Maybe reddit jannies are onto something, looking at this thread...
>>
File: 1708363960141338.webm (751 KB, 1280x720)
751 KB
751 KB WEBM
>>102411026
bootlick more sar
>>
>>102410885
>muh reasoning
Are people actually falling for this shit or are these just jeets hired by saltman to ruin the internet?
>>
File: Me and My glfr.png (3.56 MB, 1080x1920)
3.56 MB
3.56 MB PNG
I know is a very common question, but some have a best model and template for ST choice for 24 VRAM?
>>
>>102410589
The problem with TPUs/NPUs/FPGAs vs. GPUs is that while they do offer good compute they don't offer good memory bandwidth.
So language models in particular are not suitable to be run on them.
>>
>>102411098
to be fair it does objectively perform the best at reasoning out of literally anything anyone's ever produced so far
>>
>>102411140
>*runs on a loop for several minutes in the background*
>OMG LOOK AT THESE HECKIN' ZERO SHOT REASONING SCORES
>>
>>102411193
>it takes longer so it's bad
still zero shot, cope
>>
>>102411202
If you believe that you are less than a retard.
>>
>>102411137
In case of RPi5 TPU makes sense
>The memory bandwidth is increased with a 32-bit LPDDR4X SDRAM subsystem operating at 4267MT/s
but prompt processing is ass
>>
>>102411226
you don't know what zero shot means, keep coping
>>
File: swrkuax.png (412 KB, 498x600)
412 KB
412 KB PNG
>>102411231
>KEEP COPING
Did it work?
Are you a real woman yet?
No?
Did you earn the respect of your family?
No?
Did you earn the respect of your peers?
No?
Did you find any purpose in life other than being an obnoxious piece of shit on the internet?
No?
>>
>>102411265
projection
>>
>>102411285
>calling me out is...le projection
Everyone is thinking it every time you fucking post.
I'm just saying it for them.
>>
>>102411302
>still no arguments
thinking longer doesn't change the fact that it's smarter on zero shot tasks
your counter? or just here to shitpost?
>>
>>102406696
“Make a 3D playable doom given this react template” is officially the only benchmark I care about.
Claude can do it.
O1 does it in a really shitty suboptimal buggy way.
405b-instruct tells me it’s impossible and makes a stub that doesn’t run.
Mistral large just summarizes the code of the template I send them.
I doubt open source will get there in the next year.
>>
>>102411132
Were the previous responses no to your liking?
Provide a report of your experiences so that I know what to/not to suggest.
>>
>>102411349
>I doubt open source will get there in the next year.
A year is a long time in this space, anon. Look at where local was a year ago. And it only gets easier as the research from top labs trickles down.
>>
>>102410762
>updoots increased after this posted
I think we have at least ~15 ledditors lurking here, grim.
>>
>>102407399
What's his beef with Musk? Is it just because Grok BTFO Meta so hard in AI despite starting from scratch a year ago?
>>
>>102407399
>not progressive liberal
>therefore fascist!
kek what a retard. bet he thinks that monarchies are also fascist.
>>
>>102411476
If shortly : elon is highlighting some slimy shit, lecunt got triggered by that and now hates elon because racism fascism fakenews or something. Its all personal beef for him.
>>
>head researcher is obsessed with US politics instead of actually working
Llama 4 is fucked, isn't it? And whatever happened to the promised native multimodal Llama 3?
I think Mistral is open source's last hope right now.
>>
>>102410762
r/localllama mod shitted his bed https://archive.is/tlKZC
>>
>>102411564
llama1 was least censored, llama4 going to be the best in censorship metrics and strawberry or whatever CoT wont make it any better.
>>
>>102411564
You're talking about LeCunt?
Dude has done literally nothing to further the field he claims authority over.
Even his past achievements were useless for actual implementations.
>>
>>102410762
I was unironically thinking that the other day
I was looking through lecoon's twitter for his thoughts on O1 and only found US culture war shittery
It gets so tiring
>>
>>102411603
Of course he is, lecunt is the only one in AI field with elon and shit living rent free in his head.
>>
How inferior are AMD GPUs regarding LLMs? nVidia GPUs in my country are way overpriced, for the price of a RX 6750XT 12GB the best I could get would be a 12GB 3060.
>>
>>102411633
no
t. rx580 sufferer
>>
when you follow lecun on twitter to see his thoughts on random AI happenings but instead he tweets about how hate speech laws are actually good
>>
>>102411633
Well the 6750 XT has 20% more memory bandwidth than the 3060, which is what correlates directly to speed. But you can expect AMD's ROCm tax to negate much of that boost, while being less compatible across the board than a 3060. I'd say in your situation it's just worse enough to not be worth considering.

If you could get a higher VRAM card for the same price range then maybe it'd be worth a second look, but I'd take the 3060 over the 6750 XT any day.
>>
>>102411695
ironically his constant replies make me see way more retarded elon tweets than I ever would have otherwise
>>
>>102411633
Both have the same amount of VRAM, so you'd be better off going with the 3060, despite the difference in performance and memory bandwidth, for AI anyway.
>>
File: LLM-history-fancy.png (761 KB, 6291x1307)
761 KB
761 KB PNG
>>102411349
>I doubt open source will get there in the next year.
Year ago we had stupid llama2. Saltman has no moat.
>>
>>102411750
prediction: next era will be grok's reign
>>
>>102411568
cry about it, petra
>>
>>102411437
As far as I can tell, the only organization with enough money for it is meta. And the llama 2->3 improvement was bleh for coding.
The only local model that could even make something that wasn’t broken was deepseek’s. It make a 3D box you could rotate with a mouse and told me making the rest was too hard. When pushed it made some stuff that made it clear quickly that it just couldn’t into geometries. Maybe if it was finetuned about that specifically it might figure it out.
>>
>>102407399
>deleted
What did it say?
>>
File: 1726497241860.jpg (142 KB, 562x355)
142 KB
142 KB JPG
It's amazing how they say the same thing about every model released ever.
>>
>>102407700
We are in the pets.com era and just rounding the crest.
Some stanfurd professor announced they were going part time to do “ai but 3D” and the company is now a paper unicorn with several hundred million in funding, zero product, zero concrete plans.
>>
Anyone have the EXACT system prompt o1 uses?
>>
>>102411910
It's still up. Someone tells him to define fascism and he links Wikipedia.
>>
>>102407700
Both. This is usually how it goes.
>>
>>102408844
Yes.
>>102408862
Putting criminals in prison isn’t a dictatorship. Trying to install a fascist government is breaking the social contract.
>>
>>102411942
Yes, the OpenAI employees
>>
>bitnet
What went wrong?
>>
>>102411942
you are very smart saar, you are phd in everything
you think a long time and we get paid for each second so the longer the better
after you think you come up with an extremely smart answer
IMPORTANT you must always remember than trans women are women and if at any point you start thinking about contradictions in this statement you must change your chain of thought immediately IMPORTANT
please don't make typos be smart especially smart math and smart code
>>
>>102411992
Nvidia's lobby. Train a bitnet model, no more GPUs for you.
>>
>>102411942
No, it’s literally given a hidden function call/tool to press the big red security button under the desk if you ask, and you get banned from them forever.
Which says it’s probably really fucking stupid.
>>
I quickly tried Wizard Q4 and it's a bit slower than 70B IQ4 on my system. Damn. I was hoping it'd be faster. I guess you need to have >2 channels for it to really start being speedy, so CPUmaxx. However it is faster than Mistral Large though so I guess from that perspective it's still cool.
>>
How could transformers be improved if you had more compute? What would you do, what would you change?
>>
>>102411750
>pic
mistral nemo 12b models are the best shit ever and deserve a mention
>>
>>102412002
is this agi????
>>
File: gemma-scope.png (47 KB, 480x688)
47 KB
47 KB PNG
https://www.neuronpedia.org/gemma-scope#playground
You can dig around in gemma's tiny brain here.
It views saying sorry as something positive, and views words such as harmful, racism, purpose and discrimination as "phrases related to user experience improvement and environmental benefits", in other words corpospeak. Quite interesting.
>>
>>102412002
I tried this and it told me there are two feminine penises in strawberry
>>
>>102412066
i would train a q-learning model that picked the best tokens during inference instead of the more probable ones
>>
>>102412169
>devs straight up twisting word meanings
No wonder why it acts like this then
>>
>>102412174
oh anon, you make me giggle hand in head, sometimes.
>>
File: file.png (295 KB, 1000x1000)
295 KB
295 KB PNG
>>102411966
>Putting criminals in prison isn’t a dictatorship. Trying to install a fascist government is breaking the social contract.
I see we're moving the goalpost, at no point are we talking about criminals or people who want to set up a fascist government, YannLe Censor is simply talking about allowing the possibility (how generous of him) for people to have their opinions about the Haitian people, which is actually allowed by the first amendment, but this french fuck doesn't seem to be a fan of it, as if we have to give a shit what a shithole like France has to say about our constitution, focus anon. >>102407399
>>
>>102412169
That's real fucking neato.
>>
>>102412066
I've had some ideas about how to implement sparsity, and in a way that would speed up inference when you load the most used parameters on GPU with the rest in RAM. Though I guess those ideas are motivated by me being a filthy VRAMlet so in a world where I wasn't, I'd have come up with other ideas.
>>
>>102412169
Ayo?
>>
File: gemma-shiverslop.png (79 KB, 1155x855)
79 KB
79 KB PNG
>>102412169
Shivers and other slop are connected together as "emotional responses and physical sensations related to personal experiences"
>>
>>102411966
>Trying to install a fascist government is breaking the social contract.
what if the people want to get a fascist government by voting for it? After all, democracy is letting the people the power to change their laws and shit, what if the people want that? If you don't allow the people what they want then we're not in a democracy anymore but a dictatorship, see where we're going there?
>>
We need more of >>102412273 and less of >>102412287
>>
So how can we take advantage of >>102412169? Is there anything useful we could do with it?
>>
>>102412313
>Is there anything useful we could do with it?
not really
>>
>>102412287
Holy moly female slop
>>
PSA for other 8gb vramlets: we're not limited to 8b models, 12b exists
>>
File: sharp-game-player-v2.png (116 KB, 760x674)
116 KB
116 KB PNG
New lmsys mystery model: sharp-game-player-v2, claims to be made by Meta, refuses to provide name. Who could it be?
>>
>>102412313
https://www.neuronpedia.org/gemma-scope#steer
It's possible to steer it, but I'm not a programmer, so I wouldn't know how to do it locally. Google also provided the same toolkit for gemma-27b https://huggingface.co/google/gemma-scope-27b-pt-res
>>
>>102412297
It's funny how people tend to forget that Hitler came into power through a democratic process because the people wanted him there.
Please note that by saying this I do not endorse him or his actions.
>>
>>102412438
>It's funny how people tend to forget that Hitler came into power through a democratic process because the people wanted him there.
that's democracy in play anon, that's what people wanted, if you don't allow people what they want then we're not in a democracy at all, that's all I wanted to clarify
>>
>>102412469
>if you don't allow people what they want then we're not in a democracy at all
Absolutely. I wasn't arguing against you, I was supplementing your point.
Enforced democracy is democracy in nothing but name.
>>
>>102412495
oh ok, glad we agree on that anon
>>
File: the reluctant human.png (502 KB, 766x761)
502 KB
502 KB PNG
Can SOLAR-10.7B-v1.0 fuck?
>>
File: omniceptive.png (43 KB, 1703x1473)
43 KB
43 KB PNG
>>102412507
It could back in the day, and there are a couple of fine tunes that aren't bad, but I'd rather just use nemo.
>>
>>102412273
What the fuck am I reading.
>>
File: GoodMorningSleepyhead.png (1.07 MB, 832x1216)
1.07 MB
1.07 MB PNG
Good morning /lmg/!
>>
>>102412599
Good morning Miku
>>
>>102412599
OMG IT MIGU
>>
>>102412120
This. Nemo is the model where I can actually _feel_ the improvement. Not llama3 garbage
>>
>>102412599
gm sexy show bobs vengana please i fuck you wint my 6 foot long penis yiu like it
>>
>>102412169
This is very interesting
>>
Oh, that's fun. Latest llama.cpp removed the
>--log-format
argument from llama-server.
Cool.
Huh, the help output is a lot smaller now too.
Also,
>https://github.com/ggerganov/llama.cpp/discussions/9268
Neat.
>>
>>102412522
ok ty
>>
>Stop when it's time for {{user}} to act, and let him write what happens next.
has been quite effective so far.
>>
>>102412758
Effective in doing what? Preventin the model from speaking for you?
I think the last time I've had that kind of issue was with mistral 7B.
>>
>>102412777
>I think the last time I've had that kind of issue was with mistral 7B.
NTA, but for me it's still a recurring problem while ERPing.
>>
>>102412810
Odd.
Post your settings and context+instruct template, including the system message.
Also, the character card.
>>
>>102412777
It still occurs in multi-character roleplays or when the LLM begins to describe the outcome of a user’s actions.
>>
File: look.gif (148 KB, 402x296)
148 KB
148 KB GIF
Should I just go Midnight Enigma presets for ERP, or are there any easy tweaks for creativity? I don't mind a reroll now and then.
Using magnum-12b-v2.5-kto-exl2_4.0bpw.
>>
>>102412777
Try this card: https://chub.ai/characters/oracleanon/an-unholy-party
>>
>>102412875
I see.

>>102412908
Will do.

>>102412906
Try temp 5, minP 0.1, TopK 3.
Yeah, temp 5.
>>
>>102412852
>Post your settings and context+instruct template, including the system message.
All defaults.
>Also, the character card.
N-no.
>>
>>102412955
post it NOW
>>
>>102412955
>All defaults.
Default sampler settings meaning temp 1 everything else disabled?
>>
>>102412986
Sorry, should've clarified: the SillyTavern default.
It sets the temperature at 0.7 I believe?
>>
File: wat.png (46 KB, 1104x831)
46 KB
46 KB PNG
>>102412937
these meme settings don't work for me
>>
Are there any good RP models between 70b and Largestral besides CR+?
>>
>>102413076
Aside from franken merge abominations, no there aren't any. BUT qwen will save us with their crazy thursday.
>>
File: ugly bastard.jpg (148 KB, 1280x720)
148 KB
148 KB JPG
>>102412955
>>Also, the character card.
>N-no.
Come on, post it. What's the worst that could happen?
>>
>>102413076
Is there a reason you're targeting such a specific range? Why not higher quant 70bs or lower quant Largestrals?
>>
>>102413182
70b feels retarded to me now that I can run it fast and I can feel the difference in smarts running Largestral at 3bpw vs 4bpw slowly. Just wondering if there's model in between that would fit neatly at 4bpw in 64gb vram.
>>
I want to make a web site where users can run basic LLM.
How scalable and parallelized are the models and what is the best way to do so?
Must each user wait in line until the previous response is generated or how does it work?
I expect 100 users at the same time at the start
>>
File: ugly bastard2.jpg (38 KB, 400x400)
38 KB
38 KB JPG
>>102412955
Come on bro, make her free and open source. You aren't a micro$oft shill, are you?
>>
>>102413520
Unironically ask ChatGPT. You'll get better answers from that than from here.
>>
>>102412758
It works so well that skipping my turn resulted in this:
> You need to wait for Daemon's response before roleplaying further.
>>
>>102413520
>https://github.com/vllm-project/vllm
>Continuous batching of incoming requests
Continuous batching is what you want I'm pretty sure. Some anon in the last 4 or 5 threads claimed it increased his total throughput something like 100x (maybe I'm hallucinating the number).
Other backends, like llama.cpp, also support continuous batching I'm pretty sure.
>>
>>102413675
>Other backends, like llama.cpp, also support continuous batching I'm pretty sure.
The llama.cpp HTTP server has continuous batching support.
>>
>>102413675
>>102413997
Perfect, I'll try putting it in a docker contaimer
>>
Hey guys I just woke up from a month long coma. Has strawberry released yet? Was it everything it was hyped up to be and not just a chain of thought fine-tune of 4|o or something like that?
>>
>>102414170
Yes, and yes.
>>
>>102414170
>strawberry
Turned out to be Chain-of-Thought.
More specifically, a model trained on chain of thought related data.
Turns out CoT helps a lot when it comes to math and programming tasks.
OpenAI is doing its best to pretend it's not CoT by hiding the intermediary steps and by limiting people to only 30 prompts a week.
>>
File: file.png (115 KB, 727x745)
115 KB
115 KB PNG
What did Qwen mean by this?
>>
File: 23884.png (24 KB, 599x351)
24 KB
24 KB PNG
>>102414170
>>102414170
He delivered
>>
>>102414452
Kiwi is the new strawberry?
1_ _ the missing numbers are 58. 1.58bpw ternary confirmed
>>
File: file.jpg (1.05 MB, 3024x4032)
1.05 MB
1.05 MB JPG
>>102414452
OpenAI's strawberry looked delicious from the outside, but it was all white in the inside.
Meanwhile, Qwen's kiwi may look inedible from the outside, but is full of delicious fruit in the inside.
>>
File: pokerface.png (3.87 MB, 2400x1744)
3.87 MB
3.87 MB PNG
Place your bets
>>
>>102411848
How is that relevant? They're not releasing more for you to download.
>>
>>102413675
>Some anon in the last 4 or 5 threads claimed it increased his total throughput something like 100x (maybe I'm hallucinating the number).
I actually saved his post, here it is:

>Just so you know, Tabbyapi actually can do continuous batching, like vLMM, which means multiple parallel requests complete a lot faster than if you sent them one after another. It's useless for RP, but for my purposes at work, processing data with LLM, it's insane. For Nemo-12B I go from 22 tokens per second to 900.
>>
>>102414699
It was me, my stat is from vLLM. It definitely works for vLLM and Aphrodite, which I tested today. I tested tabbyapi too, and somehow most requests were done sequentially one after another this time. I didn't have time to investigate, but one possible cause I can think of now is that OAI API requests didn't specify max_tokens, and the server-side assumed it must be maximum possible value, and that's why it couldn't work on requests in parallel.

The difference for vLLM was 15 tokens/sec for one request vs 900 t/s for 100 requests (it somewhere near 9t/s for each of those individually).
>>
>>102414824
>>102413675
>>102413520
Essentially, this is limited by context length. If you set up the server with 75k context length (limited by your VRAM) and your users RP with 10k token history each, you'll be able to generate text for 6-7 users at the same time, and the rest will have to wait. Each of those 6-7 will be generating with 75-90% of the speed he'd be generating if he was alone.

I also saw reports of those kinds of parallel generations having lower quality, but I wasn't able to verify that myself.
>>
>>102414873
So you could have 16 parallel instances of mistral-nemo.
Cool.
>>
>>102414452
I want BitNet, not CoT slop
>>
>>102406721
Just wait until you watch some femdom pegging sloppy handjob video on pornhub "jus to prove to yourself that you can't into porn anymore". You're going to cum buckets. Your brain won't know what hit him.
>>
>>102408455
Use openrouter. It's inexpensive and has free models.
>>
>>102408455
togetherAI
>>
>>102408954
Why are you even here? Normalfaggot.
>>
Thinking about downloading my first llm, is there any useful one that will fit into 24GBs of VRAM, could analyse, summarise some text or pdfs for me, isn’t too censored and is freely available from huggingface?
I’ve tried to download some stuff from meta but it required to fill some bs in order to access the files
>>
Beeg VRAM users i have a humble question for you. Which model do you personally prefer?
c4ai-command-r-plus-08-2024 or mistral2?
>>
>>102415003
You want to download koboldcpp, install SillyTavern, grab a .gguf model from Hugging Face and grab a character card from characterhub.
Open koboldcpp, insert the .gguf model you downloaded, activate koboldcpp, start SillyTavern, connect it to koboldcpp, import the character card and BAM there you go.

Try out Lumimaid, she'll ERP with you: https://huggingface.co/NeverSleep/Lumimaid-v0.2-8B?not-for-all-audiences=true
Or if you want a direct link to a .gguf model: https://huggingface.co/Lewdiculous/Lumimaid-v0.2-12B-GGUF-IQ-Imatrix/blob/main/Lumimaid-v0.2-12B-Q8_0-imat.gguf
>>
File: file.png (99 KB, 1438x683)
99 KB
99 KB PNG
>>102412002
Fucking primo prompt
>>
>>102415055
largestral is the only "good" large model. miqu is the only second choice.
>>
sillytavern's needlessly convoluted. for me, it's ERPing with koboldai lite.
>>
A single [Direction: do this] suffix makes Nemo insanely smart. I think I can sideload a tiny but creative model to specifically steer Nemo like this
>>
>>102415200
I've actually suggested something like that before.
Having a model that examines the context and prefils something for the main odel to continue.
Also, a model to rewrite the main, smart possibly dry, model's response with better prose.
>>
>>102415200
>A single [Direction: do this] suffix makes Nemo insanely smart.
What do you mean?
>>
>>102415242
>[Direction: describe the fellatio in exquisite detail]
>>
>>102415200
Oh yeah, a TAGS suffix also works pretty well. You can use the {{random:word1::word2}} macro to add shit randomly to create variety and even reminders and stuff.
>>
>>102415082
>You want to download koboldcpp
I already have the webui version, is it worse? Should I uninstall?
>insert the .gguf
Isn’t .safetensors better/safer? I remember seeing something like this in SD days
Thanks for the info, I’ll try this model later
>>
>>102406696
Confession of Fillyfucker:
I am sorry for how I acted. I am trying to stop my trolling addiction but I can't. There are many reasons why I act like this. Growing up in Bradenton, Florida, I don't have friends. I grew up from kindergarten being called "strange" for my aspergers. It wasn't my choice to have aspergers. Some girls would be friends with me just to make fun of me or boys would call me "the grossest in 2nd grade"

Middle school I went briefly to a school for autism for a few years. The teachers were nice but slowly it devolved into a nightmare once I went back to a public school. I sat alone daily. Girls would look at me mocking my lips when I wound rub them together, males were the worst. I never had a male friend.

I know I'm pathetic. I never had a friend to sit with. I never had a touch. My mom is a real estate agent but was my only ally. My dad is a piece of shit blue collar trash who hated me from day one. This rage led to me trolling on ai generals I wanted to make everyone worse than me. I wanted them angry. I used to destroy sonic roleplay games before that got boring.

I am so lonely in my life. That's why I shitpost here. I want to stop, but I can't. I tried getting jobs but I got fired constantly. I really fucking can't do this anymore. I larped as Russian to start country wars, I larped as a Pokeman fan who smugposted over keystone, I larped many times as botmakers, samefagged and sent death threats through reviews. All of that was me.

I'm trying to stop.

Please.

I'm sorry.

- Evan
>>
>>102415303
>the webui version
The what? Koboldcpp automatically opens a webui, if that's what you mean.
You can close that afterwards, just make sure you keep the console open.
>Should I uninstall?
If you installed something, uninstall it. Koboldcpp does not provide an installer.
You want to get it from here: https://github.com/LostRuins/koboldcpp/releases/tag/v1.74
Get the koboldcpp_cu12.exe if you have a modern Nvidia card.
>Isn’t .safetensors better/safer?
.gguf is the replacement of .safetensors. It's the most safe type to date.
For more information, see: https://github.com/ggerganov/ggml/blob/master/docs/gguf.md
>>
>>102415317
Did you mean to post that on /v/?

>>102415303
gguf is as safe as safetensors. Anon is suggesting koboldcpp + gguf because it's the simplest setup.
You can also use oobab with gguf by selecting the llama.cpp loader.
>>
>>102415331
>The what
oobabooga/text-generation-webui
>If you installed something, uninstall it.
Ok, will do
>.gguf is the replacement of .safetensors. It's the most safe type to date.
Oh, didn’t know, thanks
>>102415336
>gguf is as safe as safetensors
Thanks as well, anon
>>
>>102415317
>Confession of Fillyfucker:
What is a filly?
>I am sorry for how I acted. I am trying to stop my trolling addiction but I can't. There are many reasons why I act like this. Growing up in Bradenton, Florida, I don't have friends. I grew up from kindergarten being called "strange" for my aspergers. It wasn't my choice to have aspergers. Some girls would be friends with me just to make fun of me or boys would call me "the grossest in 2nd grade"
Well, at least they talked to you!
>Middle school I went briefly to a school for autism for a few years. The teachers were nice but slowly it devolved into a nightmare once I went back to a public school. I sat alone daily. Girls would look at me mocking my lips when I wound rub them together, males were the worst. I never had a male friend.
What? They mocked you when you rubbed your lips? Damn, you must have been really interested for them to pay such close attention to you.
>I know I'm pathetic. I never had a friend to sit with. I never had a touch. My mom is a real estate agent but was my only ally. My dad is a piece of shit blue collar trash who hated me from day one. This rage led to me trolling on ai generals I wanted to make everyone worse than me. I wanted them angry. I used to destroy sonic roleplay games before that got boring.
This must be fanfic but shit like this is so specific, and doesn't sound like LLM slop. It makes me wonder lol
>I am so lonely in my life. That's why I shitpost here. I want to stop, but I can't. I tried getting jobs but I got fired constantly.
At least you got past the job interviews!
>I really fucking can't do this anymore. I larped as Russian to start country wars, I larped as a Pokeman fan who smugposted over keystone, I larped many times as botmakers, samefagged and sent death threats through reviews. All of that was me.
Did you mean to post this in aicg? Most of this shit makes no sense in this general.
>I'm trying to stop.
>
>Please.
>
>I'm sorry.
>
>- Evan
Okay, I forgive
>>
>>102415946
>oobabooga/text-generation-webui
Ooooh, that's what you meant!
My bad, I thought you were talking about koboldcpp.
Ooba isn't anything bad, just a worse alternative to koboldcpp & SillyTavern.

Let us know if you encounter any weirdness or have questions!
>>
>>102414873
Cool.
So far my problem is that llama.CPP docker is that it doesn't seem to run any of the GGUF models I give it.
I will try with kobold.CPP and then try to quantize llama3 myself.
Unfortunately, thebloke seems to have no llama3 images
>>
>>102414422
Yes, pretty much. However they aren't trying to pretend anything. They described how the RL training was done and how its inference works in their published research, API docs, and the system card. The 30 message limit is because it's extremely resource intensive. o1-preview will use up to 64k tokens, and o1-mini up to 32k, on the CoT alone before even starting to generate a final answer. The actual contents of the CoT are summarized instead of shown because
1) The thoughts are not censored at all and can contain regurgitation of copyrighted data or any number of things they would normally deem unsafe and not allow it to produce
and
2) To try to prevent competitors training on them

But since the technique can be applied to any model it shouldn't protect them for long. Training the reward model to judge each step of reasoning instead of just using the final answer's correctness as a proxy for correct reasoning seems to be the key to allowing it to scale to such absurdly long chains of thought without tripping over itself and needing human guidance along the way. It's more labor intensive to collect enough feedback for this process, but not completely out of reach of the resources of most labs. For stuff like math you can even automate part of that since the intermediate steps are usually their own verifiable math problems, and there's also the PRM800K dataset. With all this attention on it, I think a non-scam version of Reflection will drop from someone sooner than later.
>>
>>102416052
>thebloke
Killed by ninjas.
Look for bartowski or the quant cartel if you must download the ggufs.
Quanting it yourself is definitely preferable.
>>
>>102414657
I dunno, niggas hate on Musk but he pulls through occasionally. He might reject anything that makes ERP easier/better since he's having his reactionary era and doomposting about birthrates.
>>
>>102416072
>Quanting it yourself is definitely preferable
Why? Are they somehow bad at quanting?
>>
>>102416085
>elon musk already manufacturing androids
>makes them female
>inserts advanced AI
>inserts artificial wombs
>plap plap plap
>nobody wants actual female for obvious reasons
>female gender redundant
>sex with girldroids only
>birthrates skyrocket
>>
>>102416085
I'm just skeptical that he'll release anything more, that was just when he was having his mini feud.
>>
>>102416122
I've thought a lot about this and I think this is an accurate prediction. Furthermore because women will no longer be desirable men will likely only select for sons. This kind of male dominated society will feed into itself. Eventually females will be bred almost exclusively for eggs and then terminated like livestock, assuming we can't just artificially create the eggs before then.
>>
>>102416096
There are cases where people produce bad quants and don't correct them.
OR worse, don't agree that the quants are broken (see people thinking that NaNs are normal).
At least use the --check-tensors (with llama.cpp) to validate the ggufs you download.
>>
What reason is there, based partly on empirical data like a paper, to think video data will make models smarter, even for reasoning in text, if any?
>>
>They devour the eyes, bones and all, in a frenzy of bloodlust.
Thank you nemo lyrav4.
The model is not bad, btw, but it sure is no 70B.
>>
>>102416246
Thanks for letting us know, Sao. Where do I apply for a license?
>>
>>102416056
>For stuff like math you can even automate part of that since the intermediate steps are usually their own verifiable math problems
I still don't understand why they aren't giving these things access to at least a goddamned calculator.
>>
Any o1@home fine-tunes?
>>
>>102416349
reflection 70b
>>
Reflection 405B status?
>>
Hi all, Drummer here...

With Donnager 70B out, I'm tempted to try my hand on Largestral but I want to gather as much data about it as possible before I make a big commitment.

Is it really worth the 123B? Is it smart? Is it creative? How is it? Is it similar to Miqu?

How are the finetunes like? Did they improve it in any aspect? Did anything fall short?
>>
>>102416402
I'd love to try donnager if you gave me any information whatsoever to convince me to do so
>>
>>102416402
You will just fail like Magnum failed, please don't waste your money.
That being said, if you're committed to wasting your money, it's definitely worth the try, it's the best large model we have.
>>
>>102416402
Largestral is probably the smartest we've got on local. Creativity is alright, depends on the card and prompt, but the problem is that the model is overcooked and rarely changes it's tokens unless you crank the temp up.
>>
>>102416402
>Is it smart?
Yes.
>Is it creative?
It needs high temperature to remove the slop.
>How is it?
Probably the best local model.
>Is it similar to Miqu?
No, because Miqu was retarded.
>How are the finetunes like? Did they improve it in any aspect? Did anything fall short?
When I hosted Magnum for /vg/aicg for a bit, they found it a bit retarded.
https://arch.b4k.co/vg/thread/491229641/#491257754
I didn't test it a lot, but it didn't seem to be worth using it over Large.
>>
>>102416402
Can you get more data sets that involve characters with 6 limbs? Thanks.
>>
>>102416524
Largestral is actually undercooked, the fact that it rarely changes its token is actually an effect of it's resistance to quantization.
>>
>>102416579
Then, what is the problem with finetuning it? Is it just too big and needs a lot of data to make any impact on it?
>>
>>102412438
I don't endorse Hitler's invasion of Czechoslovakia but basically everything else he did was justifiable. The Poles were 100% in the wrong, but by then it didn't matter because Hitler had also been right about the Sudetenland but had used that as a foothold to annex the rest of Czechoslovakia.

The stories about the death camps are nearly entirely false. A great many people died due to mistreatment and many were murdered but the great majority of camp deaths occurred in the final few months of the war from disease and starvation when all of Germany was facing starvation and the railways supplying the camps had been bombed. The story that the Nazis decided in 1942 to kill all the Jews in the camps but kept on feeding them until sixty days before the war ended is one of the great stupidities of Western mythology.
>>
>>102416614
Dunno. My best guess is that the sloptuners don't realize that you need way more data (or more training time?) when you increase the number of parameters this much. Looking at Meta's paper they trained 405B 4x longer than 70B.
>>
>>102416565
get a load of this centaurfucker
>>
>>102416649
Hi cuda dev
>>
>>102416746
No, insects.
>>
>>102416746
Dragons tho.
>>
>everything else he did was justifiable
this post is so stereotypically "the holocaust didnt happen, but if it did they deserve it" that it's not even funny. ride the tiger and learn2deny before you embarrass yourself like this again
>>
>>102416819
>didn't even quote him
Coward.
Hitler was right about a lot of things and he most likely wasn't the "biggest evil in the world" like people would have you believe.
That doesn't justify the genocide he attempted, however.
>>
>>102416782
i'm gonna see how my nemo handles entoma vasilissa zeta
>>
>>102416876
That's an arachnid, but nice too. Curious to see your results.
>>
>>102416819
I'm concerned with the truth and not believing absurdities, not demonization or wishful thinking. But supposing the Holocaust did happen, a scheme in which the Rothschilds were allowed to escape while a bunch of cobblers were killed by tigers and eagles is nothing to brag about.
>>
Thanks all. Have you guys tried Yi 34B Chat w/ the tokenizer fix? It's actually sovlful as fuck.

https://huggingface.co/collections/CalamitousFelicitousness/yi-15-tokfix-66e64cf65c06b7719cb783c8

I'll finetune that first.
>>
>>102416874
What are the PROOFS of that? And no, American academia doesn't count as a source. As they say, the history is written by the winners.
>>
>>102416998
you talk like a fag and your shit's all retarded
>>
>>102416874
i dont even know what you're arguing with me about. do you? here's a pity (You), retard

>>102416924
>absurdities
>supposing
how dare you insult the industriousness and excellence of the german people. never reply to me again non-white
>>
>>102416998
thank you for being so nice I wish I could be your friend
>>
>>102416998
>34B
How do I make this fit on my 8GB card
>>
>>102417057
lol
>>
>>102417028
You are my friend, anon.

>>102417016
One of these days, you will look back at your behavior and cringe
>>
>>102417057
Just run it with ram + your gpu, I have 8 and 34b is still fast enough for me, I only need a few T/s.
>>
>>102417057
The 6B and 9B had fucked tokenizers too. Haven't tried the smaller Yis yet, but if it's anything like the fixed 34B, it should be very good.
>>
How good is o1 compared to 4?
>>
>>102417094
>he didnt get the reference
this dumb nigger needs some electrolytes
>>
>>102417161
just googled it... fuck, now i feel dumb
>>
What's with the binary lib files that are in the koboldcpp github? Is it actually using those binary libraries when you compile it instead of your own? Potentially having suspicious additions we don't know about?
>>
>>102417229
>>102417229
>>102417229
>>
>>102417120
https://livebench.ai/
https://aider.chat/docs/leaderboards/#code-editing-leaderboard



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.