[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


So I got one of these things. its got 8GB of ram to use and all the cuda whatevers. I've been toying around with ollama to run some models but there is honestly so much out there its hard to sift through.
4b seems to be the comfortable limit at least with most things I've tried. but they all feel so samey and corporate.
Anything you guys get up to with small local models?

the ai chatbot general seems to just talk about online models so I figured I'd start a new thread.
>>
>>108191985
Mistral is always a good bet on my end. Feels nice and talkative without having a lot of isms. Be sure to use a preset with sillytavern like the spaghetti one.
>>
>>108191985
>4b seems to be the comfortable limit
Only if you run it with full 16-bit weights. But you can quantize them down to 6-8 bits with essentially no noticeable changes. At 6 bit you can fit a 9B model with no trouble.
>>
>>108191996
ok so I need to go research what sillytavern is and what spaghetti means in this context...

>>108192018
So I've seen models in ollama that are "quantized" and I knew it had something with making bigger models run smaller but how does one go about doing that?
My assumption is that I would have to train the model myself? On ollama if a quantized version is available you pull it from there. How would one go about quantizing a bigger model to run?

I am now assuming that I would have to leave the ollama ecosystem to do this sort of thing.
>>
>>108191985
the question is what are you using the models to do? that is when your ram etc come into question.
>>
another general that can answer your questions is

>>>/g/lmg
they are the dedicated local model general
>>
>>108192071
Never used ollama myself, but from what I've heard I think it has options for downloading quantized versions of some models. If you use llama.cpp directly, it comes with a tool for quantizing models (no training required). But usually that's not necessary—just search for model name + "GGUF" on huggingface and you'll usually find several different repos where people have quantized it with every possible setting and you can just download the one you want

Sillytavern is honestly optional if you just want to do basic stuff for now. It has a million bells and whistles for customizing the model personality and adding jailbreaks (or if you want to RP/ERP with it), but if you just want to ask it normal things like "how do I parse XML in python?" or "what should I cook for dinner tonight?", the builtin llama.cpp web interface should be fine (and I assume ollama has something similar).

/aicg/ should be able to help you with sillytavern, if it turns out you do want it. They mostly hook it up to corpo models, but you can connect it to ollama/llama.cpp and nearly everything works just the same.
And as the other anon said, /lmg/ can help you with llama.cpp. Maybe sillytavern as well, but I'd expect /aicg/ to know more (as far as I know it's still the thing they all use)
>>
>>108191985
I'm also wondering this
>>
>>108191985
32B iq1_xxs
>>
>>108192182
still waiting for an answer to this

what are you people actually using AI for
>>
>>108193458
Sounds like OP just wants to try it out and see what it can do

I've been using MiniMax M2.5 + OpenCode and it's somewhat useful for coding. It's particularly good for things that are simple but tedious, like "find every function that takes `x: Foo, y: Bar` and flip it around to `y: Bar, x: Foo`" - implicitly it knows that it should also fix all the call sites and make sure it builds and passes tests. Running IQ4_NL with 128 GB RAM + 24 GB VRAM (4090), which gives 8-9 tok/s. It's fairly slow so I normally send it off to run in the background (working on its own separate copy of the code) while I do something else.
>>
>>108193626
>see what it can do

You can see what it can do.

It makes shitty AI videos and images.

I will never buy an AI videogame.
>>
>>108193458
I made a receipt parser using deepseek ocr. Sorts receipts by card number and renames them based on date. Much better than tesseract and a zillion regex rules
>>
>>108193877
receipts for a business?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.