[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: buy-a-fucking-ad-asshole.jpg (396 KB, 1664x2432)
396 KB
396 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103066795 & >>103057367

►News
>(10/31) QTIP: Quantization with Trellises and Incoherence Processing: https://github.com/Cornell-RelaxML/qtip
>(10/31) Fish Agent V0.1 3B: Voice-to-Voice and TTS model: https://hf.co/fishaudio/fish-agent-v0.1-3b
>(10/31) Transluce open-sources AI investigation toolkit: https://github.com/TransluceAI/observatory
>(10/30) TokenFormer models with fully attention-based architecture: https://hf.co/Haiyang-W/TokenFormer-1-5B
>(10/30) MaskGCT: Zero-Shot TTS with Masked Generative Codec Transformer: https://hf.co/amphion/MaskGCT

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: threadrecap.png (1.48 MB, 1536x1536)
1.48 MB
1.48 MB PNG
►Recent Highlights from the Previous Thread: >>103066795

--Diffusion models merging with LLMs for language generation:
>103073785 >103073859 >103073960 >103074715
--Using local models for visual novel translation:
>103075666 >103075854 >103076003 >103076659 >103077006
--Troubleshooting GPT-SoVITS2 with Silly Tavern:
>103071219 >103071342
--SmolLM2 1.7b can generate a Mandelbrot set, unlike previous Llama models:
>103070970
--Guide to choosing the right model and quantization:
>103068169
--Fitting 4 RTX 3090 GPUs into ASUS PRO WS WRX80E-SAGE SE WIFI motherboard:
>103072146 >103072175 >103072718 >103072763 >103072766 >103073357
--Discussion about AI models, benchmarks, and performance:
>103067158 >103067174 >103067237 >103067246 >103067259 >103067289 >103067326 >103067356 >103068797 >103068828 >103067274 >103067460 >103067826
--Current GPU meta and buying recommendations:
>103066797 >103066998 >103067057 >103067113 >103067157 >103067198 >103067221 >103067228 >103067149 >103067214 >103076090 >103067253 >103067797 >103067801
--Chat and image generation on 10GB VRAM, and consistent anime-style SD models:
>103070025 >103070054 >103070093 >103070229 >103070522 >103070571 >103070619
--Anon tests Noob models on "outstretched hand" prompt, finds Noob 1.0 excels at hand drawing:
>103077300
--Anon shares experience with LLMs for data extraction and discusses challenges and techniques:
>103075416 >103075431 >103076168 >103076216 >103076236 >103076272 >103076273 >103076290 >103076319 >103076260 >103076502 >103076668 >103076773 >103077016
--Anon gets SoVITS working with Illusive Man voice lines:
>103072261 >103072478 >103072527 >103072781 >103073010 >103072548 >103076145
--Konrad's CNN implementation in System Verilog:
>103073134
--Miku (free space):
>103066797 >103068268 >103074576 >103074709 >103076544 >103076601 >103077300

►Recent Highlight Posts from the Previous Thread: >>103066923

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
File: 023a3def6f9.jpg (465 KB, 1024x1024)
465 KB
465 KB JPG
--- A Measure of the Current Meta ---
> a suggestion of what to try from (You)

96GB VRAM
anthracite-org/magnum-v4-72b-gguf-Q8_0.gguf

64GB VRAM
anthracite-org/magnum-v4-72b-gguf-Q5_K_M.gguf

48GB VRAM
anthracite-org/magnum-v4-72b-gguf-IQ4_XS.gguf

24GB VRAM
anthracite-org/magnum-v4-27b-Q4_K_M.gguf

16GB VRAM
anthracite-org/magnum-v4-12b-v0.1-Q6_K.gguf

12GB VRAM
anthracite-org/magnum-v4-12b-Q4_K_M.gguf

8GB VRAM
anthracite-org/magnum-v4-12b-IQ4_XS.gguf

Potato
>>>/g/aicg

> fite me
>>
>>103077348
>suggesting bad models to newfags
Devilish.
>>
>>103077348
i will run 12b Q4_K_Ms on my 8gb card and you'll never stop me
>>
So what about this discord server?
>>
>>103077399
It's full of pedos and trannies as you'd expect.
>>
>>103076712
>>103077221
If you do go with mistral nemo make sure to enable Do Sample and BOS token if you can as well
>>
So now that Meta claims that Llama 4 will be out early 2025, what are you hoping to see from it?
>>
>>103077484
BitNet
>>
>>103077484
I really fucking hope they dropped ultra ass fuck hard dataset filtering. No matter how smart the model is it won't conjure trivia from nothing.
Please be claude, not gpt.
>>
>>103077414
>It's full of pedos and trannies as you'd expect.
What are you waiting for then?
>>
>>103077484
Good, uncensored base. IDGAF about the official instruct.
>>
>>103077484
Hoping they live up to their promise of multimodality that was supposed to be in Llama 3.
>>
>>103077552
>>103077548
You are hopeless. When will you learn that unless western society and culture suddenly does a 360, they're not allowed to openly release anything "uncensored". You should be asking for more Mistral and chink models instead.
>>
>>103077549
I'm probably already on a list so I'd rather behave.
>>
>>103077596
Trump will win #MAGA2024
>>
>>103077598
>only a list
>not all of them
ngmi
>>
>>103077596
Anthropic manages somehow. By the time it comes out, the election will be over so there will be less election "interference" hitpieces. Besides, they can make their instruct as censored as they feel they need to. The important thing is that they don't filter the pretraining data to hell.
>>
File: ySjjPWG.png (207 KB, 580x326)
207 KB
207 KB PNG
Does anyone have a chatTTS python script to load sample audio and lets me choose/see a seed?
Local TTS models have awfully bad documentation/examples.
I know there is a webui but it's buggy, i find using a script much more efficient.
>>
It is entirely unrealistic to expect them to remove any filters they had. They may not strengthen them. But they probably won't just outright remove them when Llama 3 turned out fine (for their business; coomers don't matter to them). Stop coping and just accept reality. Mistral is about the only hope left for you.

>>103077621
Anthropic is an entirely different company in a different position, producing an entirely different product (or rather, service).

>>103077603
That helps but won't change the business and the values of investors by Q1 2024. And Llama 4 already began training, so they would've played it safe with the dataset to account for the possibility of future unfavorable political landscapes anyway.
>>
>>103077484
I hope ... Who am I lying to? I don't actually have any hope left. The only salvation for LLMs is Anthropic's Opus 3.5
>>
File: 1716744327714974.jpg (818 KB, 2272x1704)
818 KB
818 KB JPG
I have 4080S
I use 12b model but it's a bit meh
I tried 8x7 model but was bit slow
I want something around 20-25b model
no idea what q4 or q6 means
only usage: coom
recommendations?
>>
QTip sounds huge why isn't anyone talking about it? In their github they mention a 1Bit 405b model they were trying too which would fit in like 56gigs
>>
>>103077726
Llama.cpp doesn't support it and people don't want to install shit just to try yet another research project that likely isn't actually that good.
>>
>>103077786
you fags won't use anything that isn't a 1 click exe
>>
>>103077802
yea, you guys suck
>>
>>103077706
Magnum
>>
>>103077726
Because they released quants for Llama 3.1 8B and Llama 3.1 405B. Even someone with quad 3090s can't run their 2 bit quant of Llama 3.1 405B.

(They also released Llama 2 7B, 13B, and 70B, which makes me wonder what they're doing.)
>>
>>103077726
1Bit quantization never works. It's just a slightly better QuIP# and that wasn't worth using either. What good is fitting 70B on a single 3090 if the perplexity doubles?
>>
>>103077484
I just want base models again. However, I expect that we will only get instruct models at 3B and 405B. Without bitnet, of course.
>>
>>103077706
>no idea what q4 or q6 means
It means download magnum
>>
>>103077818
>>103077859
stop being a retarded shilll
>>
File: nothing_burger.jpg (31 KB, 800x450)
31 KB
31 KB JPG
>>103077726
>>
>>103077484
>what are you hoping to see from it?
Absolutely nothing. It is gonna be shit for cooming. They will do a 9B and a 70B again. It is gonna be an incremental update that is barely noticeable. And the only good thing about it is probably native multi modal. I won't even download.
>>
>>103077973
>native multi modal
I bet it will be adapters again
>>
>>103077484
I hope the new Mistral will mog them.
>>
Give me your best gooner model that works on 16GB of VRAM. The death of Claude is driving me nuts and I need to blow a load stat. I will try literally any model you link me
>>
>>103078467
https://www.cleverbot.com/
>>
>>103078467
405b hermes is free on openrouter
>>
Can I voice chat with a local model in real time yet?
>>
>>103078535
the free endpoint is only 4k context
>>
>>103078555
yup

https://github.com/Standard-Intelligence/hertz-dev
>>
>>103078555
Plenty of options, from moshi to fish agent.
>>
When I was wishing /lmg/ would die I didn't mean for it to become the / /aicg/+caiggers using local models general/... It is just like 4chan in general. A corpse turned into a trophy paraded around by horrible people who should die in a fire.
>>
>>103078535
I'll give it a try but I was hoping to try at some new local models as well. I've tried Mythomax, Mythalion and Moistral before and wasn't impressed, that was months ago thobeit
>>103078493
Not going to try this
>>
>>103078467
random 12B tune I guess
>>
ROCm has failed me for the last time.
>>
>>103078467
That new killa tune released yesterday.
>>
>>103078582
Why did you wish for /lmg/ death in the first place?
>>
>>103078467
Mistral NeMo. Dumb but fun. Try the Instruct model first before you try anyone's fine tunes.
>>
The AI boom has been going on for around two years now so why is the integration of local models with other programs still so bad?, in 2022 i was expecting that by 2025 they would be able to do extremely niche stuff like searching for exhentai cosplay galleries that have comments mentioning nip slips or booting up and playing games by themselves
>>
>>103078702
Models sucked extra ass 6 months ago
>>
>>103078702
The tech landscape is currently filled with shitty startups with loads of cash trying to milk AI, but there is no one who knows anything about it. I'm getting many proposals from them due to my HF repo. Also they want to do B2B not B2C except the nsfw chatbot thing like muah.ai.
>>
>>103078702
hallucination on local models is still really bad. we're essentially waiting for models to get more accurate at smaller sizes, or for there to be hardware released that allows you to run very large models quite cheaply.
>>
MetaMetrics-MT: Tuning Meta-Metrics for Machine Translation via Human Preference Calibration
https://arxiv.org/abs/2411.00390
https://github.com/meta-metrics/metametrics
For VNTL anon if you want to mess around with another metric

A Lorentz-Equivariant Transformer for All of the LHC
https://arxiv.org/abs/2411.00446
For Johannes. How is your Master's going btw?
>>
Is there any online model that I can use to summarize my 40k+ token long coom story?
Rocinante can't cope anymore trying to retrieve info even with the context length pumped to 32k and rags.
Does Claude has a long context length?
>>
>>103079112
Or maybe a uncensored local model specialized to summarize stuff with a gigantic context window? Do that even exist?
>>
>>103077484
Uncensored base + o1 CoT finetune
>>
File: Untitled.png (761 KB, 1080x3184)
761 KB
761 KB PNG
PatternBoost: Constructions in Mathematics with a Little Help from AI
https://arxiv.org/abs/2411.00566
>We introduce PatternBoost, a flexible method for finding interesting constructions in mathematics. Our algorithm alternates between two phases. In the first ``local'' phase, a classical search algorithm is used to produce many desirable constructions. In the second ``global'' phase, a transformer neural network is trained on the best such constructions. Samples from the trained transformer are then used as seeds for the first phase, and the process is repeated. We give a detailed introduction to this technique, and discuss the results of its application to several problems in extremal combinatorics. The performance of PatternBoost varies across different problems, but there are many situations where its performance is quite impressive. Using our technique, we find the best known solutions to several long-standing problems, including the construction of a counterexample to a conjecture that had remained open for 30 years.
https://github.com/zawagner22/transformers_math_experiments
Pretty neat
>>
>>103077348
>Qwen
But that's not how you spell Nemotron!
>>
>>103077484
The most critically important component is a lack of censorship. If it doesn't have that, then it's useless at base. Fine-tunes can help a bit with that, but they come at the expense of intelligence. Make an uncensored base model and that intelligence drop is not necessary.

If they're going to include politically correct censorship in the model, then I may as well go with a Corpo cloud model instead.

Local was made to be free.
>>
>>103079133
>>103079112
Did you try Nemo?
>>
>>103077706
Mistral-Small-Instruct 22b Q4_K_M
Magnum v4 22b Q4_K_M
Magnum v4 27b IQ3_M

>no idea what q4 or q6 means
The 'q4' and 'q6' refer to quant sizes. You will need to download the relevant GGUF file to run these, at the correct quant sizes to fit your vram limitations.
>>
>>103079565
Is even worse at it than Rocinante and even more retarded, I just tested it.
>>
>>103079112
>>103079685
Split your text into chunks of 8k or 16k tokens, then summarize every chunk one by one, and finally summarize all the summaries merged together
>>
>>103079112
Try wizard 22x8 or nous hermes 405B on open router.
>>
>>103079901
Tried Hermes and it sucked ass.
>>
>>103079964
What about wizard?
>>
>>103079967
Tried wizard and it sucked dick.
>>
>>103079969
I don't know if llms can do what your asking right now. You can try chunking but that's probably a cope.
>>
>>103079112
Qwen 2.5 has enough context although I don't know if it has enough coherence.
>>
>>103079967
Not yet but I think it won't make a difference. I may have to chunk like I did time ago and some other had suggested. But from experience summarizing by chunking and then feeding a rag to the model it doesn't do a good job for continuing a story, it's gonna suck in a lot of ways.
>>103079969
Stfu retard.
>>
Good eRP text LLM for 24GB VRAM nvidia GPU? magnum-4-27B-Q4 is disappointing and giving duplicate generations no matter how much I change the prompts or tweak the values.
>>
>>103079112
>Our experiments show that while human readers easily perform this task, it is enormously challenging for all ten long-context LLMs that we evaluate: no open-weight model performs above random chance (despite their strong performance on synthetic benchmarks), while GPT-4o achieves the highest accuracy at 55.8%
>405B is only 6 points away
You may need to use other techniques in order to enhance the capability of an LLM to do summarization, such as prompting the LLM to do state tracking and summarizing every event checkpoint or scene transition. People were discussing an automated system to do this in the past, but I guess it turned into vaporware.
>>
Someday...
>>
what practical model size is anon running for daily use? 8b, 70b?
>>
>>103080398
AMD's new 1B model.
>>
>>103078982
I'm already done with my Master's degree and currently doing a PhD.
If things go well I'll use GGML to fit parton density functions and the strong coupling constant.
>>
>>103080403
cactus
>>
>>103080188
Nemotron 70b IQ2_S fits with a 4-bit cache and flash attention on, and is way better than smaller models.
>>
>>103080706
I feel like that could actually be true but at the same time I kind of feel bad about lobotomizing something that much even if it is just an algorithm...
>>
>>103079112
Chunks your story into 16K tokens, then summarize the first part and inject that as a context to summarize the second part.
>>
i want to go back bros...
back when i just installed st and had hot maid sex with pyggy and mythomax and summarize feature
>>
Is a CPU-only setup with a bunch of RAM a reasonable alternative to GPU? I'm okay with 1 token/s for 100b+ models
>>
File: 1724632017060237.jpg (43 KB, 411x418)
43 KB
43 KB JPG
>>103081277
>1 token/s for 100b+ models on CPU
You wish
>>
>>103081317
Perplexity says you can get 5-10 token/s for 70b on CPU
>>
>>103081277
Don't know if intel's new ai chip works as they claim.
It's technically still GPU though with their built-in Intel® Arc™ graphics.
>>
i want to learn how to fine tune models. specifically, i've been looking at papers where they're using audio transformers to classify bird sounds.

this model:
https://github.com/cwx-worst-one/EAT

has pretty good performance and is pre-trained on AudioSet which is a bunch of youtube audio clips. in papers, they take a bunch of 10 second audio clips, convert them into spectrograms, augment them with stuff like specaug, "fine-tune the model with adamW".

i don't know what that means. i understand how i could generate spectrograms and modify them and stuff, but what the fuck does "using adamW" mean. it's an optimizer, from what i understand, but how do i take the fuckin spectra of bird songs and make the model do math on my GPU?

in the EAT github it looks like pytorch is being used. can i just try and follow some sort of huggingface guide and that'll work? i feel like im nearly drowning here.
>>
>>103081442
With llama.cpp offloading nothing into VRAM I run a 32B at 1.5 tokens/second and a 70B around half a token per second with 2-channel 2667 MT/s DDR4 RAM. If RAM bandwidth is the limiting factor, as a first order of approximation it seems plausible to me that by going up to 16 channel RAM and DDR5 instead of DDR4 someone could run a 70B about 8 * 4800 / 2667 = 14.4 times faster than I can. That would be around 7 tokens/second. Math checks out.
>>
>>103081277
Saw a youtuber get 0.06t/s on a 405b model.
h/w was Threadripper Pro 7995wx + 256gb ram. 96c 192t. 8-channel ram.
>>
File: pepeoui.png (224 KB, 645x653)
224 KB
224 KB PNG
anons... i'm tired of the cope, i'm tired of the slop...
I went back by curiosity to text-to-image local AIs and it's so much easier to get what you want from these
when the fuck are we going to be eating good bros...
>>
>>103081277
0.7
take it or leave it
>>
>>103081654
Largestral
>>
>>103081654
i had the opposite experience yesterday
flux was making really pretty images, but not really doing what i was prompting for, and my itty bitty 12b was perfectly simulating my warring states period china qin kingdom royal harem
>>
not sure if this was posted yet in here:
https://arxiv.org/abs/2410.16454
>This paper reveals that applying quantization to models that have undergone unlearning can restore the "forgotten" information. To thoroughly evaluate this phenomenon, we conduct comprehensive experiments using various quantization techniques across multiple precision levels. We find that for unlearning methods with utility constraints, the unlearned model retains an average of 21\% of the intended forgotten knowledge in full precision, which significantly increases to 83\% after 4-bit quantization. Based on our empirical findings, we provide a theoretical explanation for the observed phenomenon and propose a quantization-robust unlearning strategy to mitigate this intricate issue...
>>
>>103081745
>An embarrassingly simple approach
Who comes up with these faggy titles and why?

Anyway, this just sounds like more incentive for corpos to be more aggressive when filtering.
>>
>>103081654
Text-to-image was a pain for me last time I checked, 90% of the time using them I was inpainting things and tweaking the settings because I had a very specific thing in my mind.

But textgen is also similar in that I am a compulsive reroller, probably a me issue.
>>
>>103081587
Start over https://d2l.ai/
>>
>>103081708
Okay Chang
>>
>>103081883
i got so good at prompting Pony and optimizing comfyui that I always get what i want, llms are so much more random and i feel like most samplers are placebo anyway
>>
>>103081745
The model just forgot that it needs to forget things lol
>>
>>103081988
they kinda are desu
the best option is just temp minp and prompting half well
>>
>>103081989
How do we get it to remember to forget?
>>
>>103077705
Son, Sonnet 3.5 New was actually a failed Opus, but they used that name to cope. Opus 3.5 is never coming.
>>
>>103078467
>The death of Claude
What?
>>
>>103082190
It will drop the day after some new model beats Sonnet 3.5. They have no reason to release any earlier than that.
>>
>>103077487
>BitNet
this, if we really want to advance in this field, BitNet must be a thing
>>
>>103082190
Opus is just dead. All the big players have realized that there is no point in training expensive 65B models like Opus when you can get even better performance with just a simple 22B like Sonnet 3.5
>>
>>103078732
What's on your hf?
>>
>>103082294
My cock pics.
>>
>>103082294
LLMs & NLP models and a few vision models
>>
File: o1.jpg (140 KB, 1080x495)
140 KB
140 KB JPG
o1 full release today. can you feel? are you excited?
>>
File: 1704364522682294.gif (2.3 MB, 498x421)
2.3 MB
2.3 MB GIF
>>103082696
No I don't
>>
>>103082696
Imagine paying $10 to find out how many Rs strawberry have. At this point it's cheaper to hire chink farms or pajeet farms, the accuracy would probably be higher too.
>>
>>103081277
With ddr5 63gb/s bandwidth I get about 0.45 t/s in largestral.
Using logic 12 channel would in theory be 6x faster, meaning 0.45x6 =2.7
In practice however it would probably be just above 1t/s, maybe 1.5?
>>
File: file.png (28 KB, 417x588)
28 KB
28 KB PNG
>>103082696
I've had the preview for weeks and I don't even use it because the weekly limit deters me. Is the "full release" better in any way?
>>
>>103082793
The full version will be RLHF'd using the feedback of millions of pajeets.
>>
File: file.png (121 KB, 859x1206)
121 KB
121 KB PNG
>>103082793
The search feature on the other hand is pretty cool. I thought it indexes websites like once a day because it reads them so quickly, but it's actually realtime.
>>
>>103082811
Take your pajeet cloudshit elsewhere, Sam.
>>
>>103082696
>we
>>
>>103082810
So it'll just be more accurate? Probably still won't use it then, I'm an engineer but I rarely need to know more than the latest webshitter technology which 4o does fine
>>
>>103082831
>pajeet
>more accurate?
>I'm an engineer
God help us all.
>>
>>103082844
Bet you don't even know what the Outbox design pattern is
>>
File: NanoPi M6.jpg (231 KB, 900x630)
231 KB
231 KB JPG
>>103077338
>there are now single board computers with 32 GB RAM and a built-in display
Has someone already put together a project where you can tell a computer/phone to generate an image in natural, spoken language?
>>
>>103081277
See the op. https://rentry.co/miqumaxx/
Hope you’re not poor
>>
>>103082877
>Wasting five minutes to come up with something to tell your computer/phone instead of using a few keywords
>>
>>103082925
I have small children in my family so the idea is that I would let them directly say to the thing what kind of image they want.
>>
>>103077348
why are the models listed different every thread
>>
>>103083071
Xe is le enlightened sekrit club gatekeeper, please understand.
>>
>>103082877
It sounds doable if you know basic python programming.
>>
>>103082877
You just need whisper and send the output to SD
>>
>>103083338
>>103083371
I know how to do it, I just don't want to do it myself.
>>
>>103083371
isn't SD too heavy for that thing?
>>
>>103083401
Lmao as if.
>>103083405
Nah, SD can run on potatoes now you just have to wait a while.
>>
What's currently the best **uncensored** ≤8GB model? I want to use it as an expensive spellcheck/text corrector, but I don't want it to give comments or straight up remove bad words from the text.
For example if I input:
>so theres this nigger you know nigger john he is areal dumb fucking nigger
It should output:
>So there's this nigger, you know, nigger John? He is a real dumb fucking nigger.
>>
>>103082765
Might be better to go for 5th gen Xeon scalable. It at least has AMX.
>>
>>103082696
>Elections are ending
.>so sam is going to release level 2 strawberry reasoning AGI to change the world
Holy fuck
>>
>>103082287
>22B like Sonnet 3.5
source?
>>
Hello newfags
>>
Not even this influx of newfriends can save /lmg/ we truly have stagnated.
>>
>>103083546
ministral 8b, maybe
>>
File: buggedcpp.png (441 KB, 449x407)
441 KB
441 KB PNG
>>103083781
>>
I'm using an 8GB 2070 super to play with models. I also have a 4GB 770 laying laying around. would there be any benefit to adding the 770 to my rig?
>>
>>103082696
>I'm hecking feeling it...
>It's so big, beautiful and BLACK...
>The BBC... I mean the AGI!
>>
is cpumaxxing worth it in any facet? i know if you add a gpu you can get decent prompt processing speed as well. but i think building a dual genoa = $5-8k. i don't need hyper-speed. i just want to use big models and not wait 25m for a response without having a massive space heater that needs dual psus to function.
>>
>>103083833
Go back >>>/pol/troon
>>
>>103083836
It wasn't worth it for me. It not that the speed isn't nice, it's just that the big models are kinda meh. 20% better largestral is not worth 8k. Hopefully something in the future comes out that will justify my purchase.
>>
>>103082696
why is sama such an underage reddit fuck? jesus christ, this "marketing" is just sad
>>
File: test.png (189 KB, 2248x1102)
189 KB
189 KB PNG
>>103083546
>>103083781 (me)
llamabros...
>>
will you guys use llama4 if it's more pozzed but has bitnet?
>>
>>103084075
No.
>>
>>103084075
Yes.
>>
>>103084075
Maybe.
>>
>>103084060
WTF? Qwen didn't complain? Didn't expect that. Do you think 8gb quant of Nemostral would do a better job than Qwen?
>>
>>103084105
>>103084108
>t. cuck
>>
>>103084075
It won't use bitnet. End of question. You'll get your basic bitch transformer model with some more multimodality stapled on (3B, 95B) and shut up.
>>
>>103084060
It's for your safety chud
>>
what do you guys use to monitor VRAM usage under GNU+Penguin?
>>
>>103084119
Are we going to get all of the modalities this time or just image input again?
>>
>>103084075
Base model or Instruct? I don't care about instruct as long as base is uncucked. l3.1 and qwen2.5 have garbage bases so fuck them.
>>
>>103084136
nvidia-smi
>>
>>103084150
thanks buddy
>>
>>103082696
fuck off Sam
>>
>>103084060
polchuds should be permanently banned off this site.
>>
>>103084137
Just input
>>
>>103083847
>>103084178
Hi sama. Do you fell the AGI? Still upset about regulatory capture failing? Will Trump fuck you over if he wins? Of course he will! Elon will be winning non-stop once he is in power. XAI will be the standard, not ChatGPT. How does that make you feel sama? Wanna cry? Wanna spam? Wanna sneed? Oh wait, you can't sneed, totally forgot about it.
>>
>>103084111
nemo 12b is really good for not safe content and its base model is less censored than qwen
>>
File: ComfyUI_00719_.png (1.12 MB, 1024x1024)
1.12 MB
1.12 MB PNG
>>103084178
>Faggot who wants no-no words removed from LLM lexicon also wants to silence anyone who disagrees
Pottery
>>
>>103084372
Oh so now we are le based and redpilled rightoids here, nice LARP bro!
>>
>>103084416
Answer this sama >>103084243
>>
>>103084441
Take your meds bro, you are hallucinating things now.
>>
File: 1730739430341.jpg (339 KB, 1024x1536)
339 KB
339 KB JPG
>>103077338
fuck a miku, choke a miku, roundhouse plap a miku
>>
>>103084484
miku execution by hanging
>>
>>103084372
NTA but I'm not interested in American culture war bullshit and /g/ would improve dramatically if the mods did their job and actually enforced the rules that already exist.
>>
File: 4x.gif (30 KB, 264x128)
30 KB
30 KB GIF
how do i expose my koboldcpp api to the internet without using the cloudflare tunnel option?
it has to be a static link so i can hardcode it into my software
if there's a way to do this with other backends that's also fine, but i enjoy the token count option you get with the koboldcpp api
>>
>>103084537
No-ip with port forwarding, ngrok tunel.
Assuming you don't have a static ip.
>>
>>103084562
i have a static ip and forwarding worked
>>
>>103084625
Yeah, the port forwarding (most likely) is necessary with a static ip too.
>>
>>103084666
this is an epiphany of how networking works to me
>>
o1 signals an end to "AI equality".

"America started the tradition where the richest consumers buy essentially the same things as the poorest. You can be watching TV and see Coca-Cola, and you know that the President drinks Coke." - Warhol

This is true of GPT models, but not o1
https://x.com/DavidSKrueger/status/1852818742650282431
>>
>>103084484
We will always be loved by Miku.
>>
>>103084075
Bitnet isn't real, stop huffing copium already you easily impressionable cucks
>>
>>103084644
What's that white stuff
>>
File: 1730739268469707.jpg (172 KB, 1206x1633)
172 KB
172 KB JPG
>>103084708
>Super grok election model
It's happening
>>
File: 1713599244640953.png (1.16 MB, 734x660)
1.16 MB
1.16 MB PNG
>>103084644
>>
>>103084699
I feel that.
I'm not a big network guy. Everything I know I learned by tinkering.
>>
>>103084903
Probably going to be 1T, with no GQA, so you need multiple clusters to run it at more than 2K context.
>>
File: 124124457658679.png (970 KB, 1024x1024)
970 KB
970 KB PNG
this is an uncannily realistic self-portrait created by x grok agi
>>
File: hunyuan-standard-256k.png (81 KB, 749x678)
81 KB
81 KB PNG
NEW CHINESE MODEL "hunyuan-standard-256k" SPOTTED ON LMARENA! 256k context? Big if true. Significant if open-weights.
>>
>>103085140
256K claimed context, that means 50K actual context, not bad.
>>
>>103085140
are we back?
>>
>>103085140
all the context in the world doesn't help for RP as long as LLMs are still utterly terrible at writing anything that isn't a self-contained scene.
>>
Bros I don't get it. Sometimes when I start text-generation-webui with Rocinate-12B I get around 20t/s on my 1080ti. Other times I start it and I get around 4t/s.
I offload all 41 layers to my gpu. 9.7/11.2GB vram is in use so I'm not overloading the VRAM. I have it set to use 12 threads since I have a 6 core cpu.
Once when I reset the thread count it magically went back to 20t/s, but it won't work again no matter how much I try. I'm using the exact same prompt, settings, and even the same other programs open on my desktop for each test.
Sometimes I start my PC in the morning and it's magically fast until I reboot it then it's slow again. Exact same FUCKING SETTINGS. How the fuck can I track down what's taking 80% of my t/s?
>>
>>103085140
quick google search:
>Proprietary model
>launched back in March
Might be resurfacing because maybe they intend to make it open weights but who knows.
>>
File: 1705509337800688.jpg (9 KB, 198x206)
9 KB
9 KB JPG
Sasuga retards. /lmg/ is now worse than /aicg/, still can't believe it. Kill yourselves faggots.
>>
>>103085234
Still upset, sAlty Sam? Not feeling the AGI? Seethe harder and maybe, just maybe, Fuhrer Trump will show some mercy.
>>
>>103085281
Weird obsession with sam altman, must be your gay urges kicking in.
>>
File: ada.jpg (55 KB, 573x729)
55 KB
55 KB JPG
>>103085310
You aren't fooling anyone, sama. You'll be locked up together with other big tech communists.
#TND #MAGA2024 #WWG1WGA #TheStormIsHere #Trump
>>
>>103085374
Go back to your polskin containment board, you are not welcome here.
>>
>>103085230
Thanks for taking part in this achievement.
>>
>/aicg/ is just shitposts about proxies, keys, and which cloud model is shittier
>/lmg/ is just dead
Grim. You would think the image gen threads might be a bit better considering all the new toys they're getting but it's a dumpsterfire or also dead in those generals too.
I blame blackrock and nipmoot.
>>
I, for one, blame the sloptuners for not making their datasets 100% open.
I hate people who chase clout instead of wanting the better of all.
That's why we don't have nice things.
>>
File: have_a_flower.jpg (724 KB, 1080x1080)
724 KB
724 KB JPG
Anyone experience reduced quality with KV-cache quantization? Honestly, I can't tell any difference in responses after turning it on - but much more free VRAM. Pretty nice.
>>
>>103085514
It's because the mods let you shit up all the AI threads with impunity so people have just stopped bothering to show up.
>>
Hi /v/.
>>
>>103085539
I did test it some ages ago and noticed it had issues recalling things from context. Don't know if it's better nowadays.
>>
>>103085221
I have a similar problem.
I don't have a solution :(

Run a very small model, and look at your cuda usage.
Then run your usual model and look at your cuda usage.

For me:
- very small model: 90% cuda usage.
- usual model I want to run: 50%. Sometimes 60% if I kill ollama and restart. One time 80%.

My guess is that some of the vram is being used by the OS for something.

Loading in a huge model that takes up all your vram,
and then loading in the model you want to use immediately after (which unloads the huge model)
sometimes helps.

My system only has the one gpu in it.
No integrated graphics.

To see if makes any difference,
I might try installing a cheap card for windows to use,
so that my ai s/w can use my nice card without interference.
>>
>>
>>103079112
You went that long in a chat with rocinante? Which one are you using? 1.1? With what settings? I had bad luck with it.
>>
>>103085140
Is it slopped tho
>>
any good model that will take my README.md and fix grammar and style? up to 13B.
>>
>>103085785
See >>103085487
>>
>>103084060
Possibly an interesting way of stopping ai assistants from scraping your page ?
>>
For some reason, llama.cpp only seems able to use 75% of my VRAM. Is that the intended behavior?
>>
>>103085893
llama.cpp itself does not determine how much VRAM to use.
It relies on the user to specify the number of layers to load into VRAM.
However, koboldcpp and ollama (and probably more downstream projects) try to estimate how many layers will fit automatically.
These estimates are typically poor and leave a lot of VRAM unused.
>>
>>103085893
Are you using 25% of it to have four panoramic displays surrounding your battlestation?
>>
no one has managed to make a finetune of the new mistral small yet that actually feels like a siginificantly changed model
it seems to be very belligerent, resistant to being altered training

I understand the temptation to say "that's just what Mistral models are like" but this one is uniquely frozen even for Mistral imo. like Behemoth actually feels significantly different from normal Largestral. While I have yet to use a Small tune that doesn't still feel like the same model
>>
>>103085637
No Miku. You're not allowed to crush my balls.
>>
>>103086037
*altered by training
>>
>>103086037
Skill issue
>>
>>103085917
I'm using llama.cpp itself rather than a downstream project. I'm manually telling it to offload all the layers to my GPUs. I have 24gb + 12gb of VRAM between my GPUs, but attempting to load a model that's larger than ~18GB throws an error about not having enough memory

>>103085938
No, I have my monitor plugged directly into the motherboard, so I think that's using the integrated graphics.
>>
File: tts.png (105 KB, 1364x456)
105 KB
105 KB PNG
It's happening... eventually...
>>
>>103086087
Sounds worse than maskgct or fish-speech https://x.com/reach_vb/status/1853475883706614232
>>
>>103086087
based gg waiting for a true multimodal and not a bullshit adapter implementation
>>
>>103085893
>>103086075
There's 3 things that use your vram, number of layers in vram, context size, and prompt processing batch size.
Try playing around with all three one at a time.
>>
>>103086075
Unless you are manually setting a tensor split it should distribute the model correctly automatically.
Are you also taking the memory for context into account?
>>
>>103086117
I don't care as long as it doesn't need python. I'm using piper on a vm and while it works perfectly, it's clunky. I want ggml-based tts.
>>
>>103086117
https://x.com/reach_vb/status/1853486414798733314
>>
Its been over a month and there still is no vision support for llama 3.2 on llama.cpp. Also, there seems to be no work being done to make ministral run properly at long contexts.

Should I just give up on llama.cpp and learn how to use vllm or something?
>>
>>103082877
Flux can do that, or a computer-use llm might be able to use stable diffusion for you
>>
>>103086427
>Should I just give up on llama.cpp and learn how to use vllm or something?
Yes. Install vllm or something and use it.
>>
>>103086427
>Should I just give up on llama.cpp and learn how to use vllm or something
You only have yourself to blame for not doing that yet.
>>
>>103086460
>>103086468
Yeah. I guess you're right. I've been spoiled too much by ooba/kobold. I really don't want to have to set up vllm but I may as well get used to it now.
>>
Refugee discord when?
>>
>>103086427
You are already on troonix so it doesn't matter troon.
>>
>>103086537
No, please, no! I'm too old to get groomed!
>>
>>103086552
What?
>>
>>103086561
vllm needs troonix
>>
>>103086571
It's called GNU/Linux.
>>
>>103084531
>improve dramatically
Just like LLM's. I love it when companies remove all wrongthink from my LLM's.
>>
>>103086537
We already have a discord lmao
>>
File: 1699618013126030.png (852 KB, 942x492)
852 KB
852 KB PNG
>>103086586
Good morning saaar!
>>
>>103086552
>linux bad
Get out.
>>
File: file.png (502 KB, 700x441)
502 KB
502 KB PNG
>>103086586
>>103086621
It always starts innocently. You want to run an llm loader or emulate some switch. And then before you know it your twink boss fires you for being a harassing "lesbian".
>>
>>103086621
Follow your own advice bro, get out and start new daily dilat- ahem, debugging session with your server oriented OS.
>>
>>103086552
>>103086571
Seek help, you're mentally ill
>>
>“During final testing, Haiku surpassed Claude 3 Opus, our previous flagship model, on many benchmarks — at a fraction of the cost. As a result, we’ve increased pricing for Claude 3.5 Haiku to reflect its increase in intelligence,” Anthropic wrote in a post on X.
Fucking Jews
>>
>>103086739
How many parameters is Haiku supposed to have? God, please someone leak it.
>>
>>103086739
*Fucking Americans
>>
What options are available for training a voice generator on given samples? I want to give a model some .mp3 samples and then generate speech from text. Can't find anything on it.
>>
i feel compelled to tell you that i'm not dead nor she
-mr. z
>>
>>103086037
Have you tried SorcererLM
>>
>>103086739
They will charge whatever the market is willing to pay
>>
>>103086834
is there a difference?
>>
>>103085595
>My guess is that some of the vram is being used by the OS for something.
It is but I'm always using small models quantized to 4_k_m so there's plenty of room to fit in my VRAM. Current usage is 9805MiB / 11264MiB with only around 1-1.5GB taken by the os.
I checked the box for no_offload_kqv and it sped things up quite a bit for a while, but now reloading the model with it checked or unchecked is still slow. It's just strange because there's no difference in debug output between when it's fast and it's slow, it's exactly the same but 5x slower for no discernable reason.
This bug and the bugs I've had with AUTOMATIC1111 being slow are the main reasons I've just not played around with AI models for a while. Shit just never works long enough to really get into it.
>>
>>103086117
About fish-speech https://x.com/cocktailpeanut/status/1853512204118540625 It also small on vram, maskgct eats up to 40gb if you send it wall of text.
>>
>>103086117
Miss me with your shit. GPT-SoVITS is already the best there is by a mile
>>
Is there a good way to progress a chat after a longer session? I find that after 20-30 minutes the character just locks and will repeat itself. The normal temperature increases, repeat penalty stuff doesn't seems to work. I am thinking RAG with a generic chapter 2 character.
>>
>>103086075
my stupid technique has worked so far. Although it limits to 24G

1. build llama-server from scratch (not sure if this helped for memory, but I was missing features)
2. CUDA_VISIBLE_DEVICES=1 (or your target card)
3. --split-mode none
4. --gpu-layers to a stupid high number and let god sort it out. I use 300.

I managed to squeeze magnum-v4-27b-Q6_K_L.gguf in my 3090. It is 22G
>>
>>103087592
They tend to do that when they realize you are a newfag.
>>
>>103087678
>doesn't know answer
>activates you stupid response
I have been here an entire 2 days after I learned about this thread on reddit. I know that is the pattern here. Your mean homesexual names won't deter me.
>>
>>103087592
You've probably hit your context limit.
Increase the context and retry and if it's suddenly smart again, you know what you're up against.
At some point though you'll run out of memory and be forced to concede.
You can try to have it summarize, and start a new session with the summary in the document hoping (praying) that it'll make use of it and stay coherent. (Good luck.)
>>
none of you even know what VRAM stands for
protip: it's not what you think it is
>>
>>103087377
With a reasonable quants fish is under 2GB
>>
>>103087746
thanks. I am way past my context limit, like 4 to 5 times. I have had my limit at 16K and it is alright looping past 16K and 32K and then starts locking at 48K and beyond.

I will start playing with summary.

I have seen some stuff about rope, but not sure where to start with that.
>>
>>103087749
benis
>>
>>103087749
Vagina RAM.
>>
>>103087844
Don't bother. Depending on what model you're using it, it might be best to do a summary of events/character actions/feelings/whatever to cut down on token usage and look at using a larger model depending on which one you're starting with.
>>
>>103087844
I remember the first time I got a really good story going. There was a macguffin in the beginning that the AI's RP character was really on about, and after what seemed like a really neat, long scene, I mention the object and the AI acts like it's something new.
I felt like the character died.
>>
>>103078583
>Not going to try this
>>
>>103084484
migu
>>
mikusex :3
>>
>>103087902
>do a summary of events/character actions/feelings/whatever to cut down on token usage
I wonder if I could build something that monitors chat logs and changes the system prompt after a certain amount. I know my character cards tend to be long, it seems necessary though. A trim at 4K could help a bunch.

>>103087912
it sucks a lot. I will periodically mention things just to keep them in context. "Is that jewel still shining strangely {{char}}?" It seems to be working, but this all is smoke and mirrors. It does break immersion a little, but it is better than the hard stop if you exceed the window completely.

I tried a resurrection by deleting half the log and loading it again. It didn't feel right and worked very poorly.
>>
Mikulove!
>>
Is there any frontend that will create a new tree element for you if you edit the reply? I use the edit/continue a lot, and there is no way to do this in SillyTavern currently.
>>
>>103088016
You can use kobold/silly/other chat UI that shows token count and then just edit the convo by taking out the last X lines, summarizing it, and putting it back in.

That's the easy way. SillyTavern tried doing something more complex, but they took it out since it didn't work that well.

An alternative more complex thing is doing liek the character card v3 standard implementation, where each character has an accompanying DB/info collection regarding them, and then extending that over time as the conversation develops.
https://github.com/kwaroran/character-card-spec-v3

But i don't really do rp stuff so this is just what I've found trying to figure out coming up with a story generator.
>>
when will this meme of pretending to be retarded die?
>>
>>103088016
I simply started using Mistral Large. It's very slow on my vramlet shit box, but it runs a very long time before it guesses wrong about the story so far.

Though at that point it's so long that any summary is unwieldy, too.

>>103088016
>I wonder if I could build something
I've had that thought, too. And probably so has everybody else who has spent a weekend looking at Python tutorials. But as >>103088067
mentioned, it's probably a lot harder to get right, if it's doable at all, than it seems. So I'm not prioritizing such a project.
>>
>>103084075
I will use it only if it's not heavily censored. If they focus too much on removing 'toxicity', then it's useless.
>>
>>103088067
I might have a closer look at this spec. Character cards are still the wild west right now.

>>103088102
>Mistral Large
respect sir. I just can't do it. I would rather waste dozens of hours trying to fix it than wait 30 seconds for a response.

>it's probably a lot harder to get right
I think the problem is that SillyTavern and such have to handle all cases. It would probably be very easy to put in a hack. You hard code it for 4K and just don't use models that are under 4K.

Projects are rough. I have too many goals and just seem to wander around. I want to fix that TTS/Image gen bug in ST and implement that new TextToVoice thing I saw on hackernews and ..... I just end up fixing things for myself with duck tape. It really sucks. I am also tired of getting my PRs rejected and "re-writen" for no reason outside the maintainer just don't want it.
>>
>>103084111
Qwen is censored, but in a different way. Keep in mind the model is Chinese. The Chinese are not infected by identity politics and tend to be openly racist towards blacks, so I would expect a Chinese model to have no problem dropping N-Bombs.

Start talking about Taiwan, though, and I bet you'll quickly see the censorship.
>>
>>103088193
https://github.com/malfoyslastname/character-card-spec-v2
Use v2 to start with.
>>
>>103088238
>The Chinese are not infected by identity politics

>>102447861
>Oh I should mention, qwen VL will NEVER mention a person's gender. Even when directly instructed to do so, as I did in my example. It's always "person", "they", "them". And it will never mention anything related to NSFW stuff even when given in the tags. I actually can't believe the fucking chinks are doing this gender neutral troon shit now.
>>102447836
>In this image, there's a person
>>
Chinese pronouns differ somewhat from English pronouns and those of other Indo-European languages. For instance, there is no differentiation in the spoken language between "he", "she" and "it" (though a written difference was introduced after contact with the West), and pronouns are not inflected to indicate whether they are the subject or object of a sentence.

source: https://en.wikipedia.org/wiki/Chinese_pronouns

Can you "men" stop looking for the boogy man everywhere. Sometimes shit is just broken.
>>
>>103088416
If it has been trained on English, the coarseness of Chinese is no defense.
>>
>>103088416
This talks about the English portion of the model thoughever
>>
>>103088238
One key distinction is that while Llama often incorporates discussions on diversity, inclusivity, and consent and uses they/them pronouns, Qwen does not specifically insert the topic of Taiwan into its narratives.
>>
>>103088436
this >>103088445 is censorship. It makes sense that it is censorship. They don't need a defense about screwing up or even just being lazy about pronouns they don't give a shit about.

There are plenty of scary things. You don't need to claim everything is.

>>103088441
yes. ESL people (not the /pol/ version of ESL) may have issues training a english model. It is more than just loading a dataset and machine goes whirrrrrrrrr. The humans involved will shape how it goes and get things wrong.
>>
How do I get a model to write more than a few lines in a role-play response? I've had this issue since MythoMax despite playing with prompts and params. I'm currently on Mistral-Nemo-12B-Instruct.
>>
>>103088565
Tell it to write longer replies.
Aside from that, each model seems to have its own idea of how long an RP response should be, from a few lines to hold my beer while I write a whole fucking novel, you don't mind that I write your character, too, right, of course not.
>>
>>103083824
bump
>>
>>103088580
I tell it to write longer in both the prompt and author's note. It writes like two lines of pretty good stuff and then that's it. Even using the continue, the model will ask me in OOC what to do next cuz it's out of ideas until I drive the story forward.

>>103088609
Mixing GPU architectures like that can cause headaches.
>>
>>103088609
No, your inference speeds will drop to the slowest card in use.
>>
>>103088636
Sounds like you've found the limits of the model, at least in the "aware it's doing an RP" mode. You might be able to assert that it IS the character, but I have a feeling that whatever you do you'll be able to feel whatever template it's settled on for filling out responses.
>>
>>103088666
Is there an RP finetune for Mistral-Nemo-12B-Instruct? I tried Lumimaid and DoryV2 but Nemo is the best I've tried.
>>
>>103088688
Anotheranon might have a suggestion. I don't stop short of 70B.
>>
>>103088710
Favorite 70b?
>>
>>103088067
a competing v3 spec lol
https://github.com/Bronya-Rand/Prom-Spec-V3/blob/main/Concept.md
>Prom V3 takes what already exists in V2 and RisuAI's V3 and adapts it to be easier to read for application developers to implement in their own codebases without the unnecessary bloat of Risu's assets folder
>>
the absolute state....
>>
>>103088731
Mist Large is my current go to for anything creative writing (NOT for anything requiring truthiness). I have to quant it down to IQ3 and it's pretty slow, but it seems to be able to sweat it out as far as 16k context. (I have a note of a long run that it collapsed at 20k.) Obviously most 70B's are L3.0 and L3.1 spins. Those it's kinda just shop around till you find something that doesn't spit out refusals barely above a whisper. Most recently I've been playing with that L3.1 Nemotron, and it seemed okay for relatively normie RP, but nothing to write home about.

And there's always CR+ I suppose.
>>
>>103088840
>I have to quant it down to IQ3 and it's pretty slow, but it seems to be able to sweat it out as far as 16k context
how much ram + vram do you have?
>>
>>103085514
it's pretty funny when the sharty zoomers whine that the thread they shitposted to death is actually indeed dead. Yeah guys you destroyed one of the few decent places to talk about a very niche subject.
>>
>>103085514
/g/ is a dumpster anyway
>>
>>103085595
>>103087253
Ok so I accidentally left it running while I played a little Factorio and it's back to being fast, no reloading the model or changing settings. Power usage seems the same for me but maybe you're right about cuda usage and it's in some kind of sleep state not using all the cores properly.
>>
What is the best model under 12GB for NER?
>>
File: 1703920815321448.jpg (90 KB, 1024x1024)
90 KB
90 KB JPG
>>103077338
>>
>>103084075
People use gpt4o latest and sonnet3.5 or opus 3.0 and those are pozzed as a motherfuck unless you feed it a 1000 token "You are Clau" jb.
>>
>>103089196
A three word prefill is enough to all safety features for Opus unless the key you're using ended up on the Anthropic naughty list and had 'extended safety features' enabled (which usually takes them months of continued abuse to do)
>>
>>103089231
This. Prefill is all you need. Even just {{char}}: is enough for Claude.
>>
>>103087990
>>103088041
loveless migusex
>>
>>103088952
12 on the card, 64 system.
>>
>>103089138
This image is illegal
>Kenzo Fujisue, a member of the Democratic Party of Japan attempted to obtain the rights to use the image of Hatsune Miku in his run for a seat in the Japanese House of Councillors. His hope was that the use of her image would appeal to younger voters. Crypton declined the use of Hatsune Miku’s image for political purposes.
>>
File: 1714001684968551.png (99 KB, 895x946)
99 KB
99 KB PNG
https://x.com/si_pbc/status/1853184307063660723

Seems like local 4o is coming sooner rather than later.
>>
>>103088416
Your entire post is basically wrong about everything, but I'm too lazy to elaborate. Pipe down midwit
>>
>>103089631
>first
Nyo
>>
>>103089663
y-you too
>>
Just think about all those dozens of open source cutting edge models that are currently on hold until the elections are over. In just a few days the open LLM sphere will look so very different to what we have now.
By the end of the year talking about "LLaMA-405B", "Mistral Large2", "Qwen2.5-72B" will feel like talking about Pygmalion-6B right now. Models will be so much better.
>>
>>103089976
lol
Even if those revolutionary models did exist, they would not be so obvious as to release them right after the elections. Maybe a month or two later.
>>
File: tmpkjqdbz43.png (351 KB, 512x512)
351 KB
351 KB PNG
It's called NoobAI but I have no idea how to use it.
>>
>>103089993
nta. And still, retards will point at a model that just happens to be released after elections and say "see? i told you!!", even if no model is released for the next 12 months.
My prediction for the future is
>In the near future, things will keep happening.
>>
>>103089993
That's why I said "by the end of the year". It'll start subtle by minor players who want to get a head start before this new golden age of LLMs truly starts. There will be groundbreaking stuff amongst these november releases already that will be better than what we have right now + models that truly make use of bitnet and all those other revolutionary improvements that they've been saving. However,it won't be comparable to the insane new models which we'll get at the turn of the year.
November: Serious improvements, first true bitnet models, etc
January: "the next step", as significant as pre-/post-llama open source if not more
>>
>>103090059
nigger
>>
>>103090072
bitch
>>
does llamacpp support text completion in sillytavern?

i keep getting "dry_sequence_breakers must be a non-empty array of strings" when trying to use it, no issues with chat completion api
>>
>>103090162
chat completion and text completion in st do the exact same thing anyway
>>
When will they develop an architecture that is capable of remembering everything that is fed to it? Trying to give current models reasoning is like trying to give insects reasoning. There is no actual reasoning going on in there, it is just the output is improving when certain certain input, AKA stimulus. No reasoning can ever happen until the model has an actual honest to god memory that it can rely upon.
>>
>>103089138
Generating paper waste with Miku
>>
>>103090175
whys it complaining about the dry sequence breakers being empty when koboldcpp doesnt care then?

its gotta have something to do with llamacpp then because they're the same prompts
>>
>>103090162
Yeah. I to wonder what "dry_sequence_breakers must be a non-empty array of strings" means. it's so mysterious.
Either disable DRY or put some shit on that list.

>>103090207
Either because it's got some defaults already or because it's disable by default.

>its gotta have something to do with llamacpp then because they're the same prompts
No. It's you.
>>
I build a tool at work which summarized information across several of our systems to help management get a unified view on particular situations.
Problem is I used Ollama, and now they want me to build out an API to extend these capabilities to other systems within the company. This use case calls for concurrent asynchronous inference of several models. It will all be served on prem as we do have the hardware for it, I just don’t know of the backend framework for serving a scalable LLM endpoint.

Any suggestions? Preferably something close to ollama and/or dockerized
>>
Using koboldcpp (vulkan) and Sillytavern, on certain character cards I run into
>processing prompt [BLAS]
like every few messages for some reason despite being way under total context limit.
I'm using Mistral Nemo Instruct 2407 Q5 K M on a 12gb GPU.
Is there anything about a character card, or a topic I could be exploring that triggers this more often than usual?
Typically this doesn't happen really ever until I hit total context limit and then it will occasionally do it along with context shifting but just on certain cards I'm constantly running into it.
>>
>>103090250
vLLM
>Preferably something close to ollama and/or dockerized
Stop that.
>>
>>103090258
there's some random function in ST, also any reference to {{user}} in character defs, system prompt etc. can be a problem
does it happen on swipes?
>>
>>103090258
{{char} in sys prompt during group chat, or triggering LB?
>>
>Still no good Japanese to English translator LLM outside of paid services full of censorship
Fuck man, Llama 3.1 405 might be the best, but it fucking blows compared to gemini pro 2 and 4o
>>
>>103090059
Sorry but you also said
>In just a few days the open LLM sphere will look so very different to what we have now
So there better be a big release in a couple of days.
"By the end of the year blah blah blah" doesn't invalidate that sentence or change it.
>>
>>103090276
>does it happen on swipes?
Hm not sure but I don't think i've run into that.
thanks I'll look through the card for those.
>>103090282
Not doing group chat. What's LB? I'm relatively new to this stuff.
>>
File: file.png (1 KB, 58x46)
1 KB
1 KB PNG
>>103090307
Lorebook / world info, which can have dynamic activation. Some cards have one embedded.
>>
>>103090288
So you are saying that better models than what we have now + bitnet and other improvements won't make the state of models look different than what we have now, even if it's nothing compared to the jump we will make by the turn of the year? I guess my expectations for the future are more humble than yours. To me, even a reasonable improvement + the first true implementation of things like bitnet in big releases qualify as a satisfactory step this month. More will come later, as mentioned.
>>
>>103090330
Oh gotcha, nah I stopped using those because of that and this one doesn't have an embedded one. Good idea though.
>>
>>103090268
>vllm
This popped up quite a bit in my research. I’ll take a look
>>
File: 1709989067827.png (893 KB, 1427x766)
893 KB
893 KB PNG
>bitnet
>>
>>103090284
How much time did you waste not learning japanese?
>>
>>103090359
aint going anywhere near that malware
>>
>>103090162
use the staging branch of sillytavern. easy fix to google.
on the other hand, i'm pretty sure sillytavern fucked up prompt caching for llama.cpp. it keeps reprocessing the whole prompt despite nothing changing. i've only found one vague reddit post about the issue. very sad.
>>
>>103090377
>on the other hand, i'm pretty sure sillytavern fucked up prompt caching for llama.cpp
Did you inspect the requests to make sure that the cache_prompt variable is being included?
>>
>>103090377
that fixed it for me, thanks dude

what reddit thread did you find it on?

>>103090405
yeah i checked the output in the sillytavern logs and it had this:

cache_prompt: true
dry_sequence_breakers: [ '\n', ':', '"', '*' ]
>>
>>103090368
I've been learning, actually. Using AI to basically act as a 'native' speaker also really helps when you have no one else to get help from. But this is only limited to very polite Japanese and doesn't me with a rougher tone. Plus, it's better having something else do the grunt work.
>>
>>103090412
>>103090412
>>103090412
>>
>>103090284
Qwen 2.5 32B and 72B is great though.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.