[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102406696 & >>102396290

►News
>(09/12) DataGemma with DataCommons retrieval: https://blog.google/technology/ai/google-datagemma-ai-llm/
>(09/12) LLaMA-Omni: Multimodal LLM with seamless speech interaction: https://huggingface.co/ICTNLP/Llama-3.1-8B-Omni
>(09/11) Fish Speech multilingual TTS with voice replication: https://hf.co/fishaudio/fish-speech-1.4
>(09/11) Pixtral: 12B with image input vision adapter: https://xcancel.com/mistralai/status/1833758285167722836
>(09/11) Solar Pro Preview, Phi-3-medium upscaled to 22B: https://hf.co/upstage/solar-pro-preview-instruct

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>102406696

--Techniques to improve Nemo's intelligence and output: >>102415200 >>102415228 >>102415273
--Recommended LLM setup for 24GB VRAM with text and PDF analysis capabilities: >>102415003 >>102415082 >>102415303 >>102415331 >>102415946 >>102415994
--Continuous batching and backend options for scalable LLM web site: >>102413520 >>102413675 >>102413997 >>102414699 >>102414824 >>102414134 >>102414873 >>102416052 >>102416096
--XTTS and Fish recommended for local AI text-to-speech, but XTTS is a dead project: >>102407218 >>102407255 >>102407327 >>102407418 >>102407454
--Technique to prevent model from speaking for user works well: >>102412758 >>102412777 >>102412810 >>102412852 >>102412986 >>102413030 >>102412875 >>102412908 >>102412937 >>102413643
--TPUs/NPUs have limited memory bandwidth, making them unsuitable for running language models: >>102410589 >>102411137 >>102411228
--Suggestions for finding and using open source LLMs for a project: >>102408387 >>102408419 >>102408455 >>102408500 >>102409110 >>102409121 >>102409334 >>102414969
--Python's limitations and improvements in llama.cpp/GGML: >>102406946 >>102407068 >>102408347 >>102408641 >>102408665
--Open source Cursor IDE and Claude Dev for solo projects: >>102409321 >>102409909
--New g1 model based on Llama-3.1, potential competition for OpenAI: >>102408681 >>102408873 >>102408923
--Gemma Scope AI model's views on racially charged language and corpospeak: >>102412169 >>102412287 >>102412313 >>102412417
--Doubts about open source models creating 3D playable Doom in React template within a year: >>102411349 >>102411437 >>102411894
--Advice on running Llama 3.1 8B locally for stackoverflow-tier questions: >>102406862 >>102406980 >>102407053 >>102407070 >>102407236 >>102407486 >>102407503 >>102407095 >>102407134
--Miku (free space): >>102406782 >>102408146 >>102411132 >>102412599

►Recent Highlight Posts from the Previous Thread: >>102406725
>>
File: 50 Days Until November 5.png (1.22 MB, 1472x1104)
1.22 MB
1.22 MB PNG
>>
Does anyone here have any experience with older business-oriented GPUs? I noticed I can get an Nvidia GRID K1 (16 GB GDDR3) for relatively cheap, and I was wondering if it would be useful for running local models.
>>
>>102417346
>Bandwidth 28.51 GB/s x4
Your RAM is going to be quicker at that point. Rule of thumb is to never go older than Pascal (P40, etc). Those are the oldest cards CUDA dev optimizes llama.cpp for.
>>
>>102414452
Imagine Chink Sota 100B 1.58 bpw that runs on a single gpu and you can't just quant it higher to justify your multi gpu rig.
>>
>>102417567
Monkey's paw. It's going to be the biggest sloppinator you've ever saw.
>>
>>102417592
Qwen models have never been good, don't see any reason to expect that to change just because they tuned it on the latest CoT scam
>>
>>102417567
unprompted
unironically rent free 24/7
>>
File: bugs.png (209 KB, 1238x1340)
209 KB
209 KB PNG
>>102416919
wonder if bigger models would've made it scarier
>>
I would like to say that Drummer is a hack and a fraud.
>>
I think Sao is a faggot.
>>
nigger
>>
>>102417686
you're so right sao
hanami is kino btw
>>
Has anyone experimented with trying to calibrate tail-free sampling or typical-p for a particular model? How did you find good values? Did you edit transformers / llama.cpp to return a bunch more data?
>>
>>102414452
>1__
10B
>>
>>102417415
I was actually thinking about image generation. But I guess it's not very useful for that either, then?
>>
File: 1717889148027915.png (183 KB, 600x530)
183 KB
183 KB PNG
phi vision/qwen vision/pixtral compatibility status?
wanted to hook up my shitty frontend to kcpp or llama to look through my archive and bring me up images based on text content in them
>>
>>102417741
I have no clue about imgen requirements these days but I doubt that a card with that bandwidth and compute on the level of a Radeon 4670/GT240 is going to be useful at all.
>>
Can't wait until there are agents so I can tell it to make 10 different programming projects that take too much effort for me to make
>>
>>102417847
I would pay for an agent service that could spit out even a half working middle complexity project.
>>
>>102417847
Now if only you had an agent that could build those agents for you
>>
>>102417808
>phi vision/qwen vision/pixtral compatibility status?
never, phi vision has been out for ages and still isnt supported
any alteratives for llamacpp?
>>
>>102417229
Local models for translating German to English? I have photocopied a couple German engineering books for a prototype nuclear reactor, but I don't want to hand over their contents to Skynet.
>>
Any LLM that can cure my crippling depression?
>>
>>102417995
Fuck no.
It might help you sink even deeper into it.
>>
So will kiwi be a thing or are they just making fun of the strawberry larp?
>>
>>102417287
Is this Flux? If yes, it's so cool to see that it can even write text in a perspective like this. I still find it hard to believe this shit is actually free and open source. Maybe it's okay to believe that some day we will have old c.AI locally.
>>
>>102417287
>Migugle
my sides
>>
>>102417995
no but they might delay your inevitable suicide
>>
>>102418042
>Maybe it's okay to believe that some day we will have old c.AI locally.
Why wouldn't you?
>>
>>102418031
believe in kiwi
>>
>>102418194
The trend is more and more slopped LLMs
>>
>>102417669
Seems nice, I didn't have much luck with chronos gold or celeste, but you like the merge of them?
>>
>>102418042
yes flux is pretty amazing. as always, it's gooning that leads the way. Yes, I know it's sfw, but that's what will drive further innovation.
Also, stuff like c.ai locally is not far off.
I'd be set with a sonnet 3.5 to run locally but I'd need a gorillion vrams. maybe someday.
>>
>>102417229
What model can I use to count the number of people (and with their race) in a thumbnail?

Do I just need YOLOv8?
>>
>>102418210
it's adequate, i'm enjoying it.
i'm not in love with it, i've just used nemomix unleashed, stardustv2, and lyrav4 to the point to where they felt stale.
>>
>>102418241
yes if you have the annotated data to train it with
>>
>>102417737
god imagine if their ebin thursday is just a nemo competitor
>>
>>102418392
as a vramlet, i'd dance in joy
>>
>>102418440
Qwen isn't Mistral. I'm not very hyped even for this 100B model, I bet it will be pure slop like their 72B.
>>
>>102418468
>Mistral
mistral puts out dry shitty models but nobody here gets that because nobody here reads books
>>
>>102418491
I read plenty of text online, and it's much more varied than anything I could find in generic books
>>
>>102418563
>quantity over quality
yeah I don't doubt that
>>
I'm using Mistral Nemo 12b on sillytavern
Sometimes, its quite great, other times it repeats itself a lot.
I know this is a completely common thing with AI chatbots
What I'm wondering is, sometimes it happens after a pretty lengthy conversation. Other times it literally happens immediately in a new chat. Is there any way to avoid the latter? I sometimes get stuck in a loop, creating new chats just to try and get the character to not talk unnecessarily verbosely and repeat itself. Usually I would prevent this via editing messages along the way but there's no real way to avoid that if its happening immediately. Why are my results (on the same character mind you) so wildly varied and why does it sometime seem to get really stuck on being fucked up? Is it just bad RNG?
>>
>>102418640
why are you asking questions about using LLMs in /lmg/?
>>
>>102418640
Yeah, that's a nemo thing for you.
I suggested the samplers
>temp 5, minP 0.1, top K 3, everything else off
And got 2 positive and one negative feedback, so you might as well try.
Also, try mini-magnu, and lyrav4. See if either of those suffer less from repetition.
>>
>>102418491
i don't know what you're talking about, this is peak literature
>>
>>102418646
I'm using a local model and wasn't sure if I needed to configure something a particular way. Most of the AI chatbot guys use services it seems like.
>>
>>102418701
reads worse than AO3 fanfic slop
>>
File: 1639462027420s.jpg (6 KB, 250x140)
6 KB
6 KB JPG
>>102418701
>her expression a mix of
>>
File: a.png (451 KB, 1448x940)
451 KB
451 KB PNG
>>102418725
try this maybe
>>
https://github.com/WangHelin1997/SSR-Speech
i like this new model, for the last hour ive been doing nothing but rewriting existing speeches by trump and letting him describe all kinds of pussies completely randomly in between
a toast to technology
>>
>>102417233
That image made me think of Katamari as I've been playing it a lot recently. So fun.
Here's a random generic Mikugen for Migu Monday.
>>
>>102417233
how the fuck would teto know how big or small the world is from space? mf there's no way you know how close or far away you are. god what a faggot.
>>
what happened to megu monday
>>
https://www.reddit.com/r/LocalLLaMA/comments/1fidhib/new_model_identifies_and_removes_slop_from/
>>
>>102419193
Not enough tokens
>>
>>102419193
What the fuck is a megu
>>
>>102419206
>If you'd like to learn more about this project, you can join the Exllama Discord
Fucking discordfags.
>>
>>102419290
>>
>>102417810
Alright, thanks. I was only looking at 'has lots of VRAM' but I guess a lot of VRAM alone doesn't cut it.
>>
>>102418725
Maybe your samplers are fucked? Try
>Temp 0.85
>TopK 0
>TopP 1
>TypicalP 1
>MinP 0.02
>Rep pen 1.2
>DoSample on
>Add BoS

Make sure you're using mistral instruct and context presets too.
>>
>(((we)))
>>
Why is this cheaper than the others
>tugm4470 Gigabyte MZ73-LM1 + 2* AMD EPYC GENOA SP5
US $2,580.00
usually saw it as US $3,230.00
scam status?
>>
File: 1725481122010684.jpg (18 KB, 253x289)
18 KB
18 KB JPG
llama-server --hf-repo bartowski/Phi-3.5-mini-instruct-GGUF --hf-file ./Phi-3.5-mini-instruct-Q5_K_M.gguf


>ggml_backend_cpu_buffer_type_alloc_buffer: failed to allocate buffer of size 51539607584

I'm trying to run it on my CPU. I've got like ~8GB of ram available. why is it a default of 51GB
>>
>>102419206
>look at what we did!!!
>wow cool, where is the dataset and model weights?
>*radio silence*
fuck off
>>
>>102419095
Post some audios if you can anon
>>
Why can't we just copy the human brain?
>>
>>102419724
Set a context size manually.
It's probably defaulting to something huge.
>>
>>102419885
Your brain?
>>
>>102419975
No it wouldn't be much better
>>
>>102419899
thanks -c 512 worked
>>
>>102419998
512 tokens of context is almost nothing. Is that really what you want?
Most people do at least 8192 these days.
>>
>>102419899
According to the docs it defaults to the model's normal context size, but I thought phi was at like 128k
>>102419998
Not sure if flash attention works on cpu, but you can try it. KV quantization also lets you at least halve the memory requirements
>>
will a 3060 12gig and 32gb of RAM be capable of hosting decent local waifus? asking for a friend
>>
>>102420022
nice double doubles, also you should be fine for the little models
>>
>>102420022
you can run nemo comfortably at 6-bpw on 12gb with 32k context. don't let anyone talk you into running mixtral, it's meme-tier in current year.
>>
>>102417229
>►Getting Started
>https://rentry.org/llama-mini-guide
>https://rentry.org/8-step-llm-guide
>https://rentry.org/llama_v2_sillytavern
>https://rentry.org/lmg-spoonfeed-guide
>https://rentry.org/rocm-llamacpp
>https://rentry.org/lmg-build-guides
You can remove all of these. They're outdated as fuck and newfags prefer to be spoonfed anyway.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.