So I got one of these things. its got 8GB of ram to use and all the cuda whatevers. I've been toying around with ollama to run some models but there is honestly so much out there its hard to sift through.4b seems to be the comfortable limit at least with most things I've tried. but they all feel so samey and corporate.Anything you guys get up to with small local models?the ai chatbot general seems to just talk about online models so I figured I'd start a new thread.
>>108191985Mistral is always a good bet on my end. Feels nice and talkative without having a lot of isms. Be sure to use a preset with sillytavern like the spaghetti one.
>>108191985>4b seems to be the comfortable limitOnly if you run it with full 16-bit weights. But you can quantize them down to 6-8 bits with essentially no noticeable changes. At 6 bit you can fit a 9B model with no trouble.
>>108191996ok so I need to go research what sillytavern is and what spaghetti means in this context...>>108192018So I've seen models in ollama that are "quantized" and I knew it had something with making bigger models run smaller but how does one go about doing that?My assumption is that I would have to train the model myself? On ollama if a quantized version is available you pull it from there. How would one go about quantizing a bigger model to run?I am now assuming that I would have to leave the ollama ecosystem to do this sort of thing.
>>108191985the question is what are you using the models to do? that is when your ram etc come into question.
another general that can answer your questions is>>>/g/lmgthey are the dedicated local model general
>>108192071Never used ollama myself, but from what I've heard I think it has options for downloading quantized versions of some models. If you use llama.cpp directly, it comes with a tool for quantizing models (no training required). But usually that's not necessary—just search for model name + "GGUF" on huggingface and you'll usually find several different repos where people have quantized it with every possible setting and you can just download the one you wantSillytavern is honestly optional if you just want to do basic stuff for now. It has a million bells and whistles for customizing the model personality and adding jailbreaks (or if you want to RP/ERP with it), but if you just want to ask it normal things like "how do I parse XML in python?" or "what should I cook for dinner tonight?", the builtin llama.cpp web interface should be fine (and I assume ollama has something similar)./aicg/ should be able to help you with sillytavern, if it turns out you do want it. They mostly hook it up to corpo models, but you can connect it to ollama/llama.cpp and nearly everything works just the same.And as the other anon said, /lmg/ can help you with llama.cpp. Maybe sillytavern as well, but I'd expect /aicg/ to know more (as far as I know it's still the thing they all use)
>>108191985I'm also wondering this
>>10819198532B iq1_xxs
>>108192182still waiting for an answer to thiswhat are you people actually using AI for
>>108193458Sounds like OP just wants to try it out and see what it can doI've been using MiniMax M2.5 + OpenCode and it's somewhat useful for coding. It's particularly good for things that are simple but tedious, like "find every function that takes `x: Foo, y: Bar` and flip it around to `y: Bar, x: Foo`" - implicitly it knows that it should also fix all the call sites and make sure it builds and passes tests. Running IQ4_NL with 128 GB RAM + 24 GB VRAM (4090), which gives 8-9 tok/s. It's fairly slow so I normally send it off to run in the background (working on its own separate copy of the code) while I do something else.
>>108193626>see what it can doYou can see what it can do.It makes shitty AI videos and images.I will never buy an AI videogame.
>>108193458I made a receipt parser using deepseek ocr. Sorts receipts by card number and renames them based on date. Much better than tesseract and a zillion regex rules
>>108193877receipts for a business?