/g/ - AI models to run on 8GB or less - Technology

Anonymous

AI models to run on 8GB or les(...) 02/19/26(Thu)18:31:41 No.108191985

File: NVIDIA-Jetson-Orin-Nano-D(...).jpg (130 KB, 1200x600)

AI models to run on 8GB or less Anonymous 02/19/26(Thu)18:31:41 No.108191985 Archived

So I got one of these things. its got 8GB of ram to use and all the cuda whatevers. I've been toying around with ollama to run some models but there is honestly so much out there its hard to sift through.
4b seems to be the comfortable limit at least with most things I've tried. but they all feel so samey and corporate.
Anything you guys get up to with small local models?

the ai chatbot general seems to just talk about online models so I figured I'd start a new thread.

Anonymous
02/19/26(Thu)18:33:00 No.108191996

Anonymous 02/19/26(Thu)18:33:00 No.108191996

>>108191985
Mistral is always a good bet on my end. Feels nice and talkative without having a lot of isms. Be sure to use a preset with sillytavern like the spaghetti one.

Anonymous
02/19/26(Thu)18:36:08 No.108192018

Anonymous 02/19/26(Thu)18:36:08 No.108192018

>>108191985
>4b seems to be the comfortable limit
Only if you run it with full 16-bit weights. But you can quantize them down to 6-8 bits with essentially no noticeable changes. At 6 bit you can fit a 9B model with no trouble.

Anonymous
02/19/26(Thu)18:46:48 No.108192071

Anonymous 02/19/26(Thu)18:46:48 No.108192071

>>108191996
ok so I need to go research what sillytavern is and what spaghetti means in this context...

>>108192018
So I've seen models in ollama that are "quantized" and I knew it had something with making bigger models run smaller but how does one go about doing that?
My assumption is that I would have to train the model myself? On ollama if a quantized version is available you pull it from there. How would one go about quantizing a bigger model to run?

I am now assuming that I would have to leave the ollama ecosystem to do this sort of thing.

Anonymous
02/19/26(Thu)19:08:50 No.108192182

Anonymous 02/19/26(Thu)19:08:50 No.108192182

>>108191985
the question is what are you using the models to do? that is when your ram etc come into question.

Anonymous
02/19/26(Thu)19:10:47 No.108192195

Anonymous 02/19/26(Thu)19:10:47 No.108192195

another general that can answer your questions is

>>>/g/lmg
they are the dedicated local model general

Anonymous
02/19/26(Thu)19:48:54 No.108192425

Anonymous 02/19/26(Thu)19:48:54 No.108192425

>>108192071
Never used ollama myself, but from what I've heard I think it has options for downloading quantized versions of some models. If you use llama.cpp directly, it comes with a tool for quantizing models (no training required). But usually that's not necessary—just search for model name + "GGUF" on huggingface and you'll usually find several different repos where people have quantized it with every possible setting and you can just download the one you want

Sillytavern is honestly optional if you just want to do basic stuff for now. It has a million bells and whistles for customizing the model personality and adding jailbreaks (or if you want to RP/ERP with it), but if you just want to ask it normal things like "how do I parse XML in python?" or "what should I cook for dinner tonight?", the builtin llama.cpp web interface should be fine (and I assume ollama has something similar).

/aicg/ should be able to help you with sillytavern, if it turns out you do want it. They mostly hook it up to corpo models, but you can connect it to ollama/llama.cpp and nearly everything works just the same.
And as the other anon said, /lmg/ can help you with llama.cpp. Maybe sillytavern as well, but I'd expect /aicg/ to know more (as far as I know it's still the thing they all use)

Anonymous
02/19/26(Thu)22:43:29 No.108193368

Anonymous 02/19/26(Thu)22:43:29 No.108193368

>>108191985
I'm also wondering this

Anonymous
02/19/26(Thu)23:03:12 No.108193443

Anonymous 02/19/26(Thu)23:03:12 No.108193443

>>108191985
32B iq1_xxs

Anonymous
02/19/26(Thu)23:07:31 No.108193458

Anonymous 02/19/26(Thu)23:07:31 No.108193458

>>108192182
still waiting for an answer to this

what are you people actually using AI for

Anonymous
02/19/26(Thu)23:51:05 No.108193626

Anonymous 02/19/26(Thu)23:51:05 No.108193626

>>108193458
Sounds like OP just wants to try it out and see what it can do

I've been using MiniMax M2.5 + OpenCode and it's somewhat useful for coding. It's particularly good for things that are simple but tedious, like "find every function that takes `x: Foo, y: Bar` and flip it around to `y: Bar, x: Foo`" - implicitly it knows that it should also fix all the call sites and make sure it builds and passes tests. Running IQ4_NL with 128 GB RAM + 24 GB VRAM (4090), which gives 8-9 tok/s. It's fairly slow so I normally send it off to run in the background (working on its own separate copy of the code) while I do something else.

Anonymous
02/20/26(Fri)00:45:32 No.108193848

Anonymous 02/20/26(Fri)00:45:32 No.108193848

>>108193626
>see what it can do

You can see what it can do.

It makes shitty AI videos and images.

I will never buy an AI videogame.

Anonymous
02/20/26(Fri)00:52:36 No.108193877

Anonymous 02/20/26(Fri)00:52:36 No.108193877

>>108193458
I made a receipt parser using deepseek ocr. Sorts receipts by card number and renames them based on date. Much better than tesseract and a zillion regex rules

Anonymous
02/20/26(Fri)02:14:29 No.108194181

Anonymous 02/20/26(Fri)02:14:29 No.108194181

>>108193877
receipts for a business?