/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 11/03/24(Sun)17:51:28 No.103077338

File: buy-a-fucking-ad-asshole.jpg (396 KB, 1664x2432)

396 KB JPG

/lmg/ - Local Models General Anonymous 11/03/24(Sun)17:51:28 No.103077338 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103066795 & >>103057367

►News
>(10/31) QTIP: Quantization with Trellises and Incoherence Processing: https://github.com/Cornell-RelaxML/qtip
>(10/31) Fish Agent V0.1 3B: Voice-to-Voice and TTS model: https://hf.co/fishaudio/fish-agent-v0.1-3b
>(10/31) Transluce open-sources AI investigation toolkit: https://github.com/TransluceAI/observatory
>(10/30) TokenFormer models with fully attention-based architecture: https://hf.co/Haiyang-W/TokenFormer-1-5B
>(10/30) MaskGCT: Zero-Shot TTS with Masked Generative Codec Transformer: https://hf.co/amphion/MaskGCT

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
11/03/24(Sun)17:51:48 No.103077342

Anonymous 11/03/24(Sun)17:51:48 No.103077342

File: threadrecap.png (1.48 MB, 1536x1536)

1.48 MB PNG

►Recent Highlights from the Previous Thread: >>103066795

--Diffusion models merging with LLMs for language generation:
>103073785 >103073859 >103073960 >103074715
--Using local models for visual novel translation:
>103075666 >103075854 >103076003 >103076659 >103077006
--Troubleshooting GPT-SoVITS2 with Silly Tavern:
>103071219 >103071342
--SmolLM2 1.7b can generate a Mandelbrot set, unlike previous Llama models:
>103070970
--Guide to choosing the right model and quantization:
>103068169
--Fitting 4 RTX 3090 GPUs into ASUS PRO WS WRX80E-SAGE SE WIFI motherboard:
>103072146 >103072175 >103072718 >103072763 >103072766 >103073357
--Discussion about AI models, benchmarks, and performance:
>103067158 >103067174 >103067237 >103067246 >103067259 >103067289 >103067326 >103067356 >103068797 >103068828 >103067274 >103067460 >103067826
--Current GPU meta and buying recommendations:
>103066797 >103066998 >103067057 >103067113 >103067157 >103067198 >103067221 >103067228 >103067149 >103067214 >103076090 >103067253 >103067797 >103067801
--Chat and image generation on 10GB VRAM, and consistent anime-style SD models:
>103070025 >103070054 >103070093 >103070229 >103070522 >103070571 >103070619
--Anon tests Noob models on "outstretched hand" prompt, finds Noob 1.0 excels at hand drawing:
>103077300
--Anon shares experience with LLMs for data extraction and discusses challenges and techniques:
>103075416 >103075431 >103076168 >103076216 >103076236 >103076272 >103076273 >103076290 >103076319 >103076260 >103076502 >103076668 >103076773 >103077016
--Anon gets SoVITS working with Illusive Man voice lines:
>103072261 >103072478 >103072527 >103072781 >103073010 >103072548 >103076145
--Konrad's CNN implementation in System Verilog:
>103073134
--Miku (free space):
>103066797 >103068268 >103074576 >103074709 >103076544 >103076601 >103077300

►Recent Highlight Posts from the Previous Thread: >>103066923

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous
11/03/24(Sun)17:52:25 No.103077348

Anonymous 11/03/24(Sun)17:52:25 No.103077348

File: 023a3def6f9.jpg (465 KB, 1024x1024)

465 KB JPG

--- A Measure of the Current Meta ---
> a suggestion of what to try from (You)

96GB VRAM
anthracite-org/magnum-v4-72b-gguf-Q8_0.gguf

64GB VRAM
anthracite-org/magnum-v4-72b-gguf-Q5_K_M.gguf

48GB VRAM
anthracite-org/magnum-v4-72b-gguf-IQ4_XS.gguf

24GB VRAM
anthracite-org/magnum-v4-27b-Q4_K_M.gguf

16GB VRAM
anthracite-org/magnum-v4-12b-v0.1-Q6_K.gguf

12GB VRAM
anthracite-org/magnum-v4-12b-Q4_K_M.gguf

8GB VRAM
anthracite-org/magnum-v4-12b-IQ4_XS.gguf

Potato
>>>/g/aicg

> fite me

Anonymous
11/03/24(Sun)17:55:11 No.103077386

Anonymous 11/03/24(Sun)17:55:11 No.103077386

>>103077348
>suggesting bad models to newfags
Devilish.

Anonymous
11/03/24(Sun)17:55:45 No.103077392

Anonymous 11/03/24(Sun)17:55:45 No.103077392

>>103077348
i will run 12b Q4_K_Ms on my 8gb card and you'll never stop me

Anonymous
11/03/24(Sun)17:56:35 No.103077399

Anonymous 11/03/24(Sun)17:56:35 No.103077399

So what about this discord server?

Anonymous
11/03/24(Sun)17:57:41 No.103077414

Anonymous 11/03/24(Sun)17:57:41 No.103077414

>>103077399
It's full of pedos and trannies as you'd expect.

Anonymous
11/03/24(Sun)18:00:00 No.103077431

Anonymous 11/03/24(Sun)18:00:00 No.103077431

>>103076712
>>103077221
If you do go with mistral nemo make sure to enable Do Sample and BOS token if you can as well

Anonymous
11/03/24(Sun)18:08:01 No.103077484

Anonymous 11/03/24(Sun)18:08:01 No.103077484

So now that Meta claims that Llama 4 will be out early 2025, what are you hoping to see from it?

Anonymous
11/03/24(Sun)18:09:05 No.103077487

Anonymous 11/03/24(Sun)18:09:05 No.103077487

>>103077484
BitNet

Anonymous
11/03/24(Sun)18:17:28 No.103077548

Anonymous 11/03/24(Sun)18:17:28 No.103077548

>>103077484
I really fucking hope they dropped ultra ass fuck hard dataset filtering. No matter how smart the model is it won't conjure trivia from nothing.
Please be claude, not gpt.

Anonymous
11/03/24(Sun)18:17:35 No.103077549

Anonymous 11/03/24(Sun)18:17:35 No.103077549

>>103077414
>It's full of pedos and trannies as you'd expect.
What are you waiting for then?

Anonymous
11/03/24(Sun)18:17:52 No.103077552

Anonymous 11/03/24(Sun)18:17:52 No.103077552

>>103077484
Good, uncensored base. IDGAF about the official instruct.

Anonymous
11/03/24(Sun)18:25:15 No.103077589

Anonymous 11/03/24(Sun)18:25:15 No.103077589

>>103077484
Hoping they live up to their promise of multimodality that was supposed to be in Llama 3.

Anonymous
11/03/24(Sun)18:26:29 No.103077596

Anonymous 11/03/24(Sun)18:26:29 No.103077596

>>103077552
>>103077548
You are hopeless. When will you learn that unless western society and culture suddenly does a 360, they're not allowed to openly release anything "uncensored". You should be asking for more Mistral and chink models instead.

Anonymous
11/03/24(Sun)18:26:58 No.103077598

Anonymous 11/03/24(Sun)18:26:58 No.103077598

>>103077549
I'm probably already on a list so I'd rather behave.

Anonymous
11/03/24(Sun)18:28:29 No.103077603

Anonymous 11/03/24(Sun)18:28:29 No.103077603

>>103077596
Trump will win #MAGA2024

Anonymous
11/03/24(Sun)18:28:31 No.103077604

Anonymous 11/03/24(Sun)18:28:31 No.103077604

>>103077598
>only a list
>not all of them
ngmi

Anonymous
11/03/24(Sun)18:31:01 No.103077621

Anonymous 11/03/24(Sun)18:31:01 No.103077621

>>103077596
Anthropic manages somehow. By the time it comes out, the election will be over so there will be less election "interference" hitpieces. Besides, they can make their instruct as censored as they feel they need to. The important thing is that they don't filter the pretraining data to hell.

Anonymous
11/03/24(Sun)18:39:56 No.103077679

Anonymous 11/03/24(Sun)18:39:56 No.103077679

File: ySjjPWG.png (207 KB, 580x326)

207 KB PNG

Does anyone have a chatTTS python script to load sample audio and lets me choose/see a seed?
Local TTS models have awfully bad documentation/examples.
I know there is a webui but it's buggy, i find using a script much more efficient.

Anonymous
11/03/24(Sun)18:41:01 No.103077685

Anonymous 11/03/24(Sun)18:41:01 No.103077685

It is entirely unrealistic to expect them to remove any filters they had. They may not strengthen them. But they probably won't just outright remove them when Llama 3 turned out fine (for their business; coomers don't matter to them). Stop coping and just accept reality. Mistral is about the only hope left for you.

>>103077621
Anthropic is an entirely different company in a different position, producing an entirely different product (or rather, service).

>>103077603
That helps but won't change the business and the values of investors by Q1 2024. And Llama 4 already began training, so they would've played it safe with the dataset to account for the possibility of future unfavorable political landscapes anyway.

Anonymous
11/03/24(Sun)18:44:12 No.103077705

Anonymous 11/03/24(Sun)18:44:12 No.103077705

>>103077484
I hope ... Who am I lying to? I don't actually have any hope left. The only salvation for LLMs is Anthropic's Opus 3.5

Anonymous
11/03/24(Sun)18:44:15 No.103077706

Anonymous 11/03/24(Sun)18:44:15 No.103077706

File: 1716744327714974.jpg (818 KB, 2272x1704)

818 KB JPG

I have 4080S
I use 12b model but it's a bit meh
I tried 8x7 model but was bit slow
I want something around 20-25b model
no idea what q4 or q6 means
only usage: coom
recommendations?

Anonymous
11/03/24(Sun)18:46:39 No.103077726

Anonymous 11/03/24(Sun)18:46:39 No.103077726

QTip sounds huge why isn't anyone talking about it? In their github they mention a 1Bit 405b model they were trying too which would fit in like 56gigs

Anonymous
11/03/24(Sun)18:54:30 No.103077786

Anonymous 11/03/24(Sun)18:54:30 No.103077786

>>103077726
Llama.cpp doesn't support it and people don't want to install shit just to try yet another research project that likely isn't actually that good.

Anonymous
11/03/24(Sun)18:56:37 No.103077802

Anonymous 11/03/24(Sun)18:56:37 No.103077802

>>103077786
you fags won't use anything that isn't a 1 click exe

Anonymous
11/03/24(Sun)18:57:47 No.103077808

Anonymous 11/03/24(Sun)18:57:47 No.103077808

>>103077802
yea, you guys suck

Anonymous
11/03/24(Sun)18:59:13 No.103077818

Anonymous 11/03/24(Sun)18:59:13 No.103077818

>>103077706
Magnum

Anonymous
11/03/24(Sun)18:59:39 No.103077825

Anonymous 11/03/24(Sun)18:59:39 No.103077825

>>103077726
Because they released quants for Llama 3.1 8B and Llama 3.1 405B. Even someone with quad 3090s can't run their 2 bit quant of Llama 3.1 405B.

(They also released Llama 2 7B, 13B, and 70B, which makes me wonder what they're doing.)

Anonymous
11/03/24(Sun)19:02:37 No.103077850

Anonymous 11/03/24(Sun)19:02:37 No.103077850

>>103077726
1Bit quantization never works. It's just a slightly better QuIP# and that wasn't worth using either. What good is fitting 70B on a single 3090 if the perplexity doubles?

Anonymous
11/03/24(Sun)19:03:28 No.103077858

Anonymous 11/03/24(Sun)19:03:28 No.103077858

>>103077484
I just want base models again. However, I expect that we will only get instruct models at 3B and 405B. Without bitnet, of course.

Anonymous
11/03/24(Sun)19:03:37 No.103077859

Anonymous 11/03/24(Sun)19:03:37 No.103077859

>>103077706
>no idea what q4 or q6 means
It means download magnum

Anonymous
11/03/24(Sun)19:04:34 No.103077866

Anonymous 11/03/24(Sun)19:04:34 No.103077866

>>103077818
>>103077859
stop being a retarded shilll

Anonymous
11/03/24(Sun)19:18:51 No.103077962

Anonymous 11/03/24(Sun)19:18:51 No.103077962

File: nothing_burger.jpg (31 KB, 800x450)

31 KB JPG

>>103077726

Anonymous
11/03/24(Sun)19:21:02 No.103077973

Anonymous 11/03/24(Sun)19:21:02 No.103077973

>>103077484
>what are you hoping to see from it?
Absolutely nothing. It is gonna be shit for cooming. They will do a 9B and a 70B again. It is gonna be an incremental update that is barely noticeable. And the only good thing about it is probably native multi modal. I won't even download.

Anonymous
11/03/24(Sun)19:24:23 No.103077995

Anonymous 11/03/24(Sun)19:24:23 No.103077995

>>103077973
>native multi modal
I bet it will be adapters again

Anonymous
11/03/24(Sun)19:51:20 No.103078207

Anonymous 11/03/24(Sun)19:51:20 No.103078207

>>103077484
I hope the new Mistral will mog them.

Anonymous
11/03/24(Sun)20:36:20 No.103078467

Anonymous 11/03/24(Sun)20:36:20 No.103078467

Give me your best gooner model that works on 16GB of VRAM. The death of Claude is driving me nuts and I need to blow a load stat. I will try literally any model you link me

Anonymous
11/03/24(Sun)20:40:25 No.103078493

Anonymous 11/03/24(Sun)20:40:25 No.103078493

>>103078467
https://www.cleverbot.com/

Anonymous
11/03/24(Sun)20:49:28 No.103078535

Anonymous 11/03/24(Sun)20:49:28 No.103078535

>>103078467
405b hermes is free on openrouter

Anonymous
11/03/24(Sun)20:53:59 No.103078555

Anonymous 11/03/24(Sun)20:53:59 No.103078555

Can I voice chat with a local model in real time yet?

Anonymous
11/03/24(Sun)20:55:30 No.103078564

Anonymous 11/03/24(Sun)20:55:30 No.103078564

>>103078535
the free endpoint is only 4k context

Anonymous
11/03/24(Sun)20:55:58 No.103078568

Anonymous 11/03/24(Sun)20:55:58 No.103078568

>>103078555
yup

https://github.com/Standard-Intelligence/hertz-dev

Anonymous
11/03/24(Sun)20:56:03 No.103078569

Anonymous 11/03/24(Sun)20:56:03 No.103078569

>>103078555
Plenty of options, from moshi to fish agent.

Anonymous
11/03/24(Sun)20:57:56 No.103078582

Anonymous 11/03/24(Sun)20:57:56 No.103078582

When I was wishing /lmg/ would die I didn't mean for it to become the / /aicg/+caiggers using local models general/... It is just like 4chan in general. A corpse turned into a trophy paraded around by horrible people who should die in a fire.

Anonymous
11/03/24(Sun)20:58:06 No.103078583

Anonymous 11/03/24(Sun)20:58:06 No.103078583

>>103078535
I'll give it a try but I was hoping to try at some new local models as well. I've tried Mythomax, Mythalion and Moistral before and wasn't impressed, that was months ago thobeit
>>103078493
Not going to try this

Anonymous
11/03/24(Sun)21:03:43 No.103078612

Anonymous 11/03/24(Sun)21:03:43 No.103078612

>>103078467
random 12B tune I guess

Anonymous
11/03/24(Sun)21:09:11 No.103078642

Anonymous 11/03/24(Sun)21:09:11 No.103078642

ROCm has failed me for the last time.

Anonymous
11/03/24(Sun)21:11:57 No.103078662

Anonymous 11/03/24(Sun)21:11:57 No.103078662

>>103078467
That new killa tune released yesterday.

Anonymous
11/03/24(Sun)21:11:58 No.103078663

Anonymous 11/03/24(Sun)21:11:58 No.103078663

>>103078582
Why did you wish for /lmg/ death in the first place?

Anonymous
11/03/24(Sun)21:16:48 No.103078688

Anonymous 11/03/24(Sun)21:16:48 No.103078688

>>103078467
Mistral NeMo. Dumb but fun. Try the Instruct model first before you try anyone's fine tunes.

Anonymous
11/03/24(Sun)21:20:00 No.103078702

Anonymous 11/03/24(Sun)21:20:00 No.103078702

The AI boom has been going on for around two years now so why is the integration of local models with other programs still so bad?, in 2022 i was expecting that by 2025 they would be able to do extremely niche stuff like searching for exhentai cosplay galleries that have comments mentioning nip slips or booting up and playing games by themselves

Anonymous
11/03/24(Sun)21:21:39 No.103078712

Anonymous 11/03/24(Sun)21:21:39 No.103078712

>>103078702
Models sucked extra ass 6 months ago

Anonymous
11/03/24(Sun)21:24:25 No.103078732

Anonymous 11/03/24(Sun)21:24:25 No.103078732

>>103078702
The tech landscape is currently filled with shitty startups with loads of cash trying to milk AI, but there is no one who knows anything about it. I'm getting many proposals from them due to my HF repo. Also they want to do B2B not B2C except the nsfw chatbot thing like muah.ai.

Anonymous
11/03/24(Sun)21:34:13 No.103078793

Anonymous 11/03/24(Sun)21:34:13 No.103078793

>>103078702
hallucination on local models is still really bad. we're essentially waiting for models to get more accurate at smaller sizes, or for there to be hardware released that allows you to run very large models quite cheaply.

Anonymous
11/03/24(Sun)22:00:02 No.103078982

Anonymous 11/03/24(Sun)22:00:02 No.103078982

MetaMetrics-MT: Tuning Meta-Metrics for Machine Translation via Human Preference Calibration
https://arxiv.org/abs/2411.00390
https://github.com/meta-metrics/metametrics
For VNTL anon if you want to mess around with another metric

A Lorentz-Equivariant Transformer for All of the LHC
https://arxiv.org/abs/2411.00446
For Johannes. How is your Master's going btw?

Anonymous
11/03/24(Sun)22:19:22 No.103079112

Anonymous 11/03/24(Sun)22:19:22 No.103079112

Is there any online model that I can use to summarize my 40k+ token long coom story?
Rocinante can't cope anymore trying to retrieve info even with the context length pumped to 32k and rags.
Does Claude has a long context length?

Anonymous
11/03/24(Sun)22:21:22 No.103079133

Anonymous 11/03/24(Sun)22:21:22 No.103079133

>>103079112
Or maybe a uncensored local model specialized to summarize stuff with a gigantic context window? Do that even exist?

Anonymous
11/03/24(Sun)22:29:50 No.103079193

Anonymous 11/03/24(Sun)22:29:50 No.103079193

>>103077484
Uncensored base + o1 CoT finetune

Anonymous
11/03/24(Sun)22:39:51 No.103079253

Anonymous 11/03/24(Sun)22:39:51 No.103079253

File: Untitled.png (761 KB, 1080x3184)

761 KB PNG

PatternBoost: Constructions in Mathematics with a Little Help from AI
https://arxiv.org/abs/2411.00566
>We introduce PatternBoost, a flexible method for finding interesting constructions in mathematics. Our algorithm alternates between two phases. In the first ``local'' phase, a classical search algorithm is used to produce many desirable constructions. In the second ``global'' phase, a transformer neural network is trained on the best such constructions. Samples from the trained transformer are then used as seeds for the first phase, and the process is repeated. We give a detailed introduction to this technique, and discuss the results of its application to several problems in extremal combinatorics. The performance of PatternBoost varies across different problems, but there are many situations where its performance is quite impressive. Using our technique, we find the best known solutions to several long-standing problems, including the construction of a counterexample to a conjecture that had remained open for 30 years.
https://github.com/zawagner22/transformers_math_experiments
Pretty neat

Anonymous
11/03/24(Sun)22:54:05 No.103079349

Anonymous 11/03/24(Sun)22:54:05 No.103079349

>>103077348
>Qwen
But that's not how you spell Nemotron!

Anonymous
11/03/24(Sun)23:07:16 No.103079458

Anonymous 11/03/24(Sun)23:07:16 No.103079458

>>103077484
The most critically important component is a lack of censorship. If it doesn't have that, then it's useless at base. Fine-tunes can help a bit with that, but they come at the expense of intelligence. Make an uncensored base model and that intelligence drop is not necessary.

If they're going to include politically correct censorship in the model, then I may as well go with a Corpo cloud model instead.

Local was made to be free.

Anonymous
11/03/24(Sun)23:19:28 No.103079565

Anonymous 11/03/24(Sun)23:19:28 No.103079565

>>103079133
>>103079112
Did you try Nemo?

Anonymous
11/03/24(Sun)23:27:03 No.103079632

Anonymous 11/03/24(Sun)23:27:03 No.103079632

>>103077706
Mistral-Small-Instruct 22b Q4_K_M
Magnum v4 22b Q4_K_M
Magnum v4 27b IQ3_M

>no idea what q4 or q6 means
The 'q4' and 'q6' refer to quant sizes. You will need to download the relevant GGUF file to run these, at the correct quant sizes to fit your vram limitations.

Anonymous
11/03/24(Sun)23:36:22 No.103079685

Anonymous 11/03/24(Sun)23:36:22 No.103079685

>>103079565
Is even worse at it than Rocinante and even more retarded, I just tested it.

Anonymous
11/04/24(Mon)00:04:57 No.103079886

Anonymous 11/04/24(Mon)00:04:57 No.103079886

>>103079112
>>103079685
Split your text into chunks of 8k or 16k tokens, then summarize every chunk one by one, and finally summarize all the summaries merged together

Anonymous
11/04/24(Mon)00:07:10 No.103079901

Anonymous 11/04/24(Mon)00:07:10 No.103079901

>>103079112
Try wizard 22x8 or nous hermes 405B on open router.

Anonymous
11/04/24(Mon)00:14:01 No.103079964

Anonymous 11/04/24(Mon)00:14:01 No.103079964

>>103079901
Tried Hermes and it sucked ass.

Anonymous
11/04/24(Mon)00:14:39 No.103079967

Anonymous 11/04/24(Mon)00:14:39 No.103079967

>>103079964
What about wizard?

Anonymous
11/04/24(Mon)00:15:04 No.103079969

Anonymous 11/04/24(Mon)00:15:04 No.103079969

>>103079967
Tried wizard and it sucked dick.

Anonymous
11/04/24(Mon)00:16:16 No.103079977

Anonymous 11/04/24(Mon)00:16:16 No.103079977

>>103079969
I don't know if llms can do what your asking right now. You can try chunking but that's probably a cope.

Anonymous
11/04/24(Mon)00:32:27 No.103080119

Anonymous 11/04/24(Mon)00:32:27 No.103080119

>>103079112
Qwen 2.5 has enough context although I don't know if it has enough coherence.

Anonymous
11/04/24(Mon)00:32:29 No.103080120

Anonymous 11/04/24(Mon)00:32:29 No.103080120

>>103079967
Not yet but I think it won't make a difference. I may have to chunk like I did time ago and some other had suggested. But from experience summarizing by chunking and then feeding a rag to the model it doesn't do a good job for continuing a story, it's gonna suck in a lot of ways.
>>103079969
Stfu retard.

Anonymous
11/04/24(Mon)00:43:41 No.103080188

Anonymous 11/04/24(Mon)00:43:41 No.103080188

Good eRP text LLM for 24GB VRAM nvidia GPU? magnum-4-27B-Q4 is disappointing and giving duplicate generations no matter how much I change the prompts or tweak the values.

Anonymous
11/04/24(Mon)00:52:50 No.103080230

Anonymous 11/04/24(Mon)00:52:50 No.103080230

>>103079112
>Our experiments show that while human readers easily perform this task, it is enormously challenging for all ten long-context LLMs that we evaluate: no open-weight model performs above random chance (despite their strong performance on synthetic benchmarks), while GPT-4o achieves the highest accuracy at 55.8%
>405B is only 6 points away
You may need to use other techniques in order to enhance the capability of an LLM to do summarization, such as prompting the LLM to do state tracking and summarizing every event checkpoint or scene transition. People were discussing an automated system to do this in the past, but I guess it turned into vaporware.

Anonymous
11/04/24(Mon)01:25:44 No.103080385

Anonymous 11/04/24(Mon)01:25:44 No.103080385

Someday...

Anonymous
11/04/24(Mon)01:28:40 No.103080398

Anonymous 11/04/24(Mon)01:28:40 No.103080398

what practical model size is anon running for daily use? 8b, 70b?

Anonymous
11/04/24(Mon)01:29:06 No.103080403

Anonymous 11/04/24(Mon)01:29:06 No.103080403

>>103080398
AMD's new 1B model.

llama.cpp CUDA dev !!OM2Fp6Fn93S
11/04/24(Mon)01:43:13 No.103080460

llama.cpp CUDA dev !!OM2Fp6Fn93S 11/04/24(Mon)01:43:13 No.103080460

>>103078982
I'm already done with my Master's degree and currently doing a PhD.
If things go well I'll use GGML to fit parton density functions and the strong coupling constant.

Anonymous
11/04/24(Mon)01:51:54 No.103080507

Anonymous 11/04/24(Mon)01:51:54 No.103080507

>>103080403
cactus

Anonymous
11/04/24(Mon)02:40:22 No.103080706

Anonymous 11/04/24(Mon)02:40:22 No.103080706

>>103080188
Nemotron 70b IQ2_S fits with a 4-bit cache and flash attention on, and is way better than smaller models.

Anonymous
11/04/24(Mon)02:45:10 No.103080720

Anonymous 11/04/24(Mon)02:45:10 No.103080720

>>103080706
I feel like that could actually be true but at the same time I kind of feel bad about lobotomizing something that much even if it is just an algorithm...

Anonymous
11/04/24(Mon)02:51:52 No.103080750

Anonymous 11/04/24(Mon)02:51:52 No.103080750

>>103079112
Chunks your story into 16K tokens, then summarize the first part and inject that as a context to summarize the second part.

Anonymous
11/04/24(Mon)02:54:33 No.103080761

Anonymous 11/04/24(Mon)02:54:33 No.103080761

i want to go back bros...
back when i just installed st and had hot maid sex with pyggy and mythomax and summarize feature

Anonymous
11/04/24(Mon)04:19:40 No.103081277

Anonymous 11/04/24(Mon)04:19:40 No.103081277

Is a CPU-only setup with a bunch of RAM a reasonable alternative to GPU? I'm okay with 1 token/s for 100b+ models

Anonymous
11/04/24(Mon)04:26:35 No.103081317

Anonymous 11/04/24(Mon)04:26:35 No.103081317

File: 1724632017060237.jpg (43 KB, 411x418)

43 KB JPG

>>103081277
>1 token/s for 100b+ models on CPU
You wish

Anonymous
11/04/24(Mon)04:50:07 No.103081442

Anonymous 11/04/24(Mon)04:50:07 No.103081442

>>103081317
Perplexity says you can get 5-10 token/s for 70b on CPU

Anonymous
11/04/24(Mon)04:56:06 No.103081479

Anonymous 11/04/24(Mon)04:56:06 No.103081479

>>103081277
Don't know if intel's new ai chip works as they claim.
It's technically still GPU though with their built-in Intel® Arc™ graphics.

Anonymous
11/04/24(Mon)05:14:19 No.103081587

Anonymous 11/04/24(Mon)05:14:19 No.103081587

i want to learn how to fine tune models. specifically, i've been looking at papers where they're using audio transformers to classify bird sounds.

this model:
https://github.com/cwx-worst-one/EAT

has pretty good performance and is pre-trained on AudioSet which is a bunch of youtube audio clips. in papers, they take a bunch of 10 second audio clips, convert them into spectrograms, augment them with stuff like specaug, "fine-tune the model with adamW".

i don't know what that means. i understand how i could generate spectrograms and modify them and stuff, but what the fuck does "using adamW" mean. it's an optimizer, from what i understand, but how do i take the fuckin spectra of bird songs and make the model do math on my GPU?

in the EAT github it looks like pytorch is being used. can i just try and follow some sort of huggingface guide and that'll work? i feel like im nearly drowning here.

Anonymous
11/04/24(Mon)05:14:28 No.103081589

Anonymous 11/04/24(Mon)05:14:28 No.103081589

>>103081442
With llama.cpp offloading nothing into VRAM I run a 32B at 1.5 tokens/second and a 70B around half a token per second with 2-channel 2667 MT/s DDR4 RAM. If RAM bandwidth is the limiting factor, as a first order of approximation it seems plausible to me that by going up to 16 channel RAM and DDR5 instead of DDR4 someone could run a 70B about 8 * 4800 / 2667 = 14.4 times faster than I can. That would be around 7 tokens/second. Math checks out.

Anonymous
11/04/24(Mon)05:20:05 No.103081621

Anonymous 11/04/24(Mon)05:20:05 No.103081621

>>103081277
Saw a youtuber get 0.06t/s on a 405b model.
h/w was Threadripper Pro 7995wx + 256gb ram. 96c 192t. 8-channel ram.

Anonymous
11/04/24(Mon)05:28:01 No.103081654

Anonymous 11/04/24(Mon)05:28:01 No.103081654

File: pepeoui.png (224 KB, 645x653)

224 KB PNG

anons... i'm tired of the cope, i'm tired of the slop...
I went back by curiosity to text-to-image local AIs and it's so much easier to get what you want from these
when the fuck are we going to be eating good bros...

Anonymous
11/04/24(Mon)05:31:12 No.103081679

Anonymous 11/04/24(Mon)05:31:12 No.103081679

>>103081277
0.7
take it or leave it

Anonymous
11/04/24(Mon)05:32:31 No.103081688

Anonymous 11/04/24(Mon)05:32:31 No.103081688

>>103081654
Largestral

Anonymous
11/04/24(Mon)05:35:37 No.103081708

Anonymous 11/04/24(Mon)05:35:37 No.103081708

>>103081654
i had the opposite experience yesterday
flux was making really pretty images, but not really doing what i was prompting for, and my itty bitty 12b was perfectly simulating my warring states period china qin kingdom royal harem

Anonymous
11/04/24(Mon)05:40:34 No.103081745

Anonymous 11/04/24(Mon)05:40:34 No.103081745

not sure if this was posted yet in here:
https://arxiv.org/abs/2410.16454
>This paper reveals that applying quantization to models that have undergone unlearning can restore the "forgotten" information. To thoroughly evaluate this phenomenon, we conduct comprehensive experiments using various quantization techniques across multiple precision levels. We find that for unlearning methods with utility constraints, the unlearned model retains an average of 21\% of the intended forgotten knowledge in full precision, which significantly increases to 83\% after 4-bit quantization. Based on our empirical findings, we provide a theoretical explanation for the observed phenomenon and propose a quantization-robust unlearning strategy to mitigate this intricate issue...

Anonymous
11/04/24(Mon)05:46:39 No.103081790

Anonymous 11/04/24(Mon)05:46:39 No.103081790

>>103081745
>An embarrassingly simple approach
Who comes up with these faggy titles and why?

Anyway, this just sounds like more incentive for corpos to be more aggressive when filtering.

Anonymous
11/04/24(Mon)06:00:22 No.103081883

Anonymous 11/04/24(Mon)06:00:22 No.103081883

>>103081654
Text-to-image was a pain for me last time I checked, 90% of the time using them I was inpainting things and tweaking the settings because I had a very specific thing in my mind.

But textgen is also similar in that I am a compulsive reroller, probably a me issue.

Anonymous
11/04/24(Mon)06:09:38 No.103081964

Anonymous 11/04/24(Mon)06:09:38 No.103081964

>>103081587
Start over https://d2l.ai/

Anonymous
11/04/24(Mon)06:11:14 No.103081971

Anonymous 11/04/24(Mon)06:11:14 No.103081971

>>103081708
Okay Chang

Anonymous
11/04/24(Mon)06:13:31 No.103081988

Anonymous 11/04/24(Mon)06:13:31 No.103081988

>>103081883
i got so good at prompting Pony and optimizing comfyui that I always get what i want, llms are so much more random and i feel like most samplers are placebo anyway

Anonymous
11/04/24(Mon)06:13:47 No.103081989

Anonymous 11/04/24(Mon)06:13:47 No.103081989

>>103081745
The model just forgot that it needs to forget things lol

Anonymous
11/04/24(Mon)06:34:00 No.103082089

Anonymous 11/04/24(Mon)06:34:00 No.103082089

>>103081988
they kinda are desu
the best option is just temp minp and prompting half well

Anonymous
11/04/24(Mon)06:34:25 No.103082093

Anonymous 11/04/24(Mon)06:34:25 No.103082093

>>103081989
How do we get it to remember to forget?

Anonymous
11/04/24(Mon)06:47:42 No.103082190

Anonymous 11/04/24(Mon)06:47:42 No.103082190

>>103077705
Son, Sonnet 3.5 New was actually a failed Opus, but they used that name to cope. Opus 3.5 is never coming.

Anonymous
11/04/24(Mon)06:48:42 No.103082197

Anonymous 11/04/24(Mon)06:48:42 No.103082197

>>103078467
>The death of Claude
What?

Anonymous
11/04/24(Mon)06:52:15 No.103082219

Anonymous 11/04/24(Mon)06:52:15 No.103082219

>>103082190
It will drop the day after some new model beats Sonnet 3.5. They have no reason to release any earlier than that.

Anonymous
11/04/24(Mon)07:02:32 No.103082280

Anonymous 11/04/24(Mon)07:02:32 No.103082280

>>103077487
>BitNet
this, if we really want to advance in this field, BitNet must be a thing

Anonymous
11/04/24(Mon)07:03:29 No.103082287

Anonymous 11/04/24(Mon)07:03:29 No.103082287

>>103082190
Opus is just dead. All the big players have realized that there is no point in training expensive 65B models like Opus when you can get even better performance with just a simple 22B like Sonnet 3.5

Anonymous
11/04/24(Mon)07:04:57 No.103082294

Anonymous 11/04/24(Mon)07:04:57 No.103082294

>>103078732
What's on your hf?

Anonymous
11/04/24(Mon)07:06:25 No.103082301

Anonymous 11/04/24(Mon)07:06:25 No.103082301

>>103082294
My cock pics.

Anonymous
11/04/24(Mon)07:09:13 No.103082316

Anonymous 11/04/24(Mon)07:09:13 No.103082316

>>103082294
LLMs & NLP models and a few vision models

Anonymous
11/04/24(Mon)08:13:16 No.103082696

Anonymous 11/04/24(Mon)08:13:16 No.103082696

File: o1.jpg (140 KB, 1080x495)

140 KB JPG

o1 full release today. can you feel? are you excited?

Anonymous
11/04/24(Mon)08:15:38 No.103082714

Anonymous 11/04/24(Mon)08:15:38 No.103082714

File: 1704364522682294.gif (2.3 MB, 498x421)

2.3 MB GIF

>>103082696
No I don't

Anonymous
11/04/24(Mon)08:21:53 No.103082755

Anonymous 11/04/24(Mon)08:21:53 No.103082755

>>103082696
Imagine paying $10 to find out how many Rs strawberry have. At this point it's cheaper to hire chink farms or pajeet farms, the accuracy would probably be higher too.

Anonymous
11/04/24(Mon)08:24:06 No.103082765

Anonymous 11/04/24(Mon)08:24:06 No.103082765

>>103081277
With ddr5 63gb/s bandwidth I get about 0.45 t/s in largestral.
Using logic 12 channel would in theory be 6x faster, meaning 0.45x6 =2.7
In practice however it would probably be just above 1t/s, maybe 1.5?

Anonymous
11/04/24(Mon)08:29:32 No.103082793

Anonymous 11/04/24(Mon)08:29:32 No.103082793

File: file.png (28 KB, 417x588)

28 KB PNG

>>103082696
I've had the preview for weeks and I don't even use it because the weekly limit deters me. Is the "full release" better in any way?

Anonymous
11/04/24(Mon)08:31:47 No.103082810

Anonymous 11/04/24(Mon)08:31:47 No.103082810

>>103082793
The full version will be RLHF'd using the feedback of millions of pajeets.

Anonymous
11/04/24(Mon)08:31:57 No.103082811

Anonymous 11/04/24(Mon)08:31:57 No.103082811

File: file.png (121 KB, 859x1206)

121 KB PNG

>>103082793
The search feature on the other hand is pretty cool. I thought it indexes websites like once a day because it reads them so quickly, but it's actually realtime.

Anonymous
11/04/24(Mon)08:35:10 No.103082823

Anonymous 11/04/24(Mon)08:35:10 No.103082823

>>103082811
Take your pajeet cloudshit elsewhere, Sam.

Anonymous
11/04/24(Mon)08:36:24 No.103082830

Anonymous 11/04/24(Mon)08:36:24 No.103082830

>>103082696
>we

Anonymous
11/04/24(Mon)08:36:29 No.103082831

Anonymous 11/04/24(Mon)08:36:29 No.103082831

>>103082810
So it'll just be more accurate? Probably still won't use it then, I'm an engineer but I rarely need to know more than the latest webshitter technology which 4o does fine

Anonymous
11/04/24(Mon)08:39:15 No.103082844

Anonymous 11/04/24(Mon)08:39:15 No.103082844

>>103082831
>pajeet
>more accurate?
>I'm an engineer
God help us all.

Anonymous
11/04/24(Mon)08:41:05 No.103082852

Anonymous 11/04/24(Mon)08:41:05 No.103082852

>>103082844
Bet you don't even know what the Outbox design pattern is

Anonymous
11/04/24(Mon)08:43:33 No.103082877

Anonymous 11/04/24(Mon)08:43:33 No.103082877

File: NanoPi M6.jpg (231 KB, 900x630)

231 KB JPG

>>103077338
>there are now single board computers with 32 GB RAM and a built-in display
Has someone already put together a project where you can tell a computer/phone to generate an image in natural, spoken language?

Anonymous
11/04/24(Mon)08:44:57 No.103082890

Anonymous 11/04/24(Mon)08:44:57 No.103082890

>>103081277
See the op. https://rentry.co/miqumaxx/
Hope you’re not poor

Anonymous
11/04/24(Mon)08:49:17 No.103082925

Anonymous 11/04/24(Mon)08:49:17 No.103082925

>>103082877
>Wasting five minutes to come up with something to tell your computer/phone instead of using a few keywords

Anonymous
11/04/24(Mon)08:51:03 No.103082939

Anonymous 11/04/24(Mon)08:51:03 No.103082939

>>103082925
I have small children in my family so the idea is that I would let them directly say to the thing what kind of image they want.

Anonymous
11/04/24(Mon)09:09:30 No.103083071

Anonymous 11/04/24(Mon)09:09:30 No.103083071

>>103077348
why are the models listed different every thread

Anonymous
11/04/24(Mon)09:13:40 No.103083097

Anonymous 11/04/24(Mon)09:13:40 No.103083097

>>103083071
Xe is le enlightened sekrit club gatekeeper, please understand.

Anonymous
11/04/24(Mon)09:48:08 No.103083338

Anonymous 11/04/24(Mon)09:48:08 No.103083338

>>103082877
It sounds doable if you know basic python programming.

Anonymous
11/04/24(Mon)09:52:20 No.103083371

Anonymous 11/04/24(Mon)09:52:20 No.103083371

>>103082877
You just need whisper and send the output to SD

Anonymous
11/04/24(Mon)09:57:41 No.103083401

Anonymous 11/04/24(Mon)09:57:41 No.103083401

>>103083338
>>103083371
I know how to do it, I just don't want to do it myself.

Anonymous
11/04/24(Mon)09:58:10 No.103083405

Anonymous 11/04/24(Mon)09:58:10 No.103083405

>>103083371
isn't SD too heavy for that thing?

Anonymous
11/04/24(Mon)10:00:48 No.103083431

Anonymous 11/04/24(Mon)10:00:48 No.103083431

>>103083401
Lmao as if.
>>103083405
Nah, SD can run on potatoes now you just have to wait a while.

Anonymous
11/04/24(Mon)10:16:06 No.103083546

Anonymous 11/04/24(Mon)10:16:06 No.103083546

What's currently the best **uncensored** ≤8GB model? I want to use it as an expensive spellcheck/text corrector, but I don't want it to give comments or straight up remove bad words from the text.
For example if I input:
>so theres this nigger you know nigger john he is areal dumb fucking nigger
It should output:
>So there's this nigger, you know, nigger John? He is a real dumb fucking nigger.

Anonymous
11/04/24(Mon)10:18:56 No.103083572

Anonymous 11/04/24(Mon)10:18:56 No.103083572

>>103082765
Might be better to go for 5th gen Xeon scalable. It at least has AMX.

Anonymous
11/04/24(Mon)10:20:24 No.103083590

Anonymous 11/04/24(Mon)10:20:24 No.103083590

>>103082696
>Elections are ending
.>so sam is going to release level 2 strawberry reasoning AGI to change the world
Holy fuck

Anonymous
11/04/24(Mon)10:36:55 No.103083711

Anonymous 11/04/24(Mon)10:36:55 No.103083711

>>103082287
>22B like Sonnet 3.5
source?

Anonymous
11/04/24(Mon)10:39:16 No.103083728

Anonymous 11/04/24(Mon)10:39:16 No.103083728

Hello newfags

Anonymous
11/04/24(Mon)10:43:12 No.103083766

Anonymous 11/04/24(Mon)10:43:12 No.103083766

Not even this influx of newfriends can save /lmg/ we truly have stagnated.

Anonymous
11/04/24(Mon)10:45:09 No.103083781

Anonymous 11/04/24(Mon)10:45:09 No.103083781

>>103083546
ministral 8b, maybe

Anonymous
11/04/24(Mon)10:46:32 No.103083792

Anonymous 11/04/24(Mon)10:46:32 No.103083792

File: buggedcpp.png (441 KB, 449x407)

441 KB PNG

>>103083781

Anonymous
11/04/24(Mon)10:50:24 No.103083824

Anonymous 11/04/24(Mon)10:50:24 No.103083824

I'm using an 8GB 2070 super to play with models. I also have a 4GB 770 laying laying around. would there be any benefit to adding the 770 to my rig?

Anonymous
11/04/24(Mon)10:51:11 No.103083833

Anonymous 11/04/24(Mon)10:51:11 No.103083833

File: strawberry-sam_altman_fee(...).png (89 KB, 415x707)

89 KB PNG

>>103082696
>I'm hecking feeling it...
>It's so big, beautiful and BLACK...
>The BBC... I mean the AGI!

Anonymous
11/04/24(Mon)10:51:25 No.103083836

Anonymous 11/04/24(Mon)10:51:25 No.103083836

is cpumaxxing worth it in any facet? i know if you add a gpu you can get decent prompt processing speed as well. but i think building a dual genoa = $5-8k. i don't need hyper-speed. i just want to use big models and not wait 25m for a response without having a massive space heater that needs dual psus to function.

Anonymous
11/04/24(Mon)10:53:14 No.103083847

Anonymous 11/04/24(Mon)10:53:14 No.103083847

>>103083833
Go back >>>/pol/troon

Anonymous
11/04/24(Mon)10:55:55 No.103083863

Anonymous 11/04/24(Mon)10:55:55 No.103083863

>>103083836
It wasn't worth it for me. It not that the speed isn't nice, it's just that the big models are kinda meh. 20% better largestral is not worth 8k. Hopefully something in the future comes out that will justify my purchase.

Anonymous
11/04/24(Mon)11:11:47 No.103084021

Anonymous 11/04/24(Mon)11:11:47 No.103084021

>>103082696
why is sama such an underage reddit fuck? jesus christ, this "marketing" is just sad

Anonymous
11/04/24(Mon)11:15:59 No.103084060

Anonymous 11/04/24(Mon)11:15:59 No.103084060

File: test.png (189 KB, 2248x1102)

189 KB PNG

>>103083546
>>103083781 (me)
llamabros...

Anonymous
11/04/24(Mon)11:17:36 No.103084075

Anonymous 11/04/24(Mon)11:17:36 No.103084075

will you guys use llama4 if it's more pozzed but has bitnet?

Anonymous
11/04/24(Mon)11:19:03 No.103084094

Anonymous 11/04/24(Mon)11:19:03 No.103084094

>>103084075
No.

Anonymous
11/04/24(Mon)11:19:57 No.103084105

Anonymous 11/04/24(Mon)11:19:57 No.103084105

>>103084075
Yes.

Anonymous
11/04/24(Mon)11:20:26 No.103084108

Anonymous 11/04/24(Mon)11:20:26 No.103084108

>>103084075
Maybe.

Anonymous
11/04/24(Mon)11:20:40 No.103084111

Anonymous 11/04/24(Mon)11:20:40 No.103084111

>>103084060
WTF? Qwen didn't complain? Didn't expect that. Do you think 8gb quant of Nemostral would do a better job than Qwen?

Anonymous
11/04/24(Mon)11:21:12 No.103084116

Anonymous 11/04/24(Mon)11:21:12 No.103084116

>>103084105
>>103084108
>t. cuck

Anonymous
11/04/24(Mon)11:22:00 No.103084119

Anonymous 11/04/24(Mon)11:22:00 No.103084119

>>103084075
It won't use bitnet. End of question. You'll get your basic bitch transformer model with some more multimodality stapled on (3B, 95B) and shut up.

Anonymous
11/04/24(Mon)11:23:19 No.103084133

Anonymous 11/04/24(Mon)11:23:19 No.103084133

>>103084060
It's for your safety chud

Anonymous
11/04/24(Mon)11:23:35 No.103084136

Anonymous 11/04/24(Mon)11:23:35 No.103084136

what do you guys use to monitor VRAM usage under GNU+Penguin?

Anonymous
11/04/24(Mon)11:23:57 No.103084137

Anonymous 11/04/24(Mon)11:23:57 No.103084137

>>103084119
Are we going to get all of the modalities this time or just image input again?

Anonymous
11/04/24(Mon)11:24:07 No.103084140

Anonymous 11/04/24(Mon)11:24:07 No.103084140

>>103084075
Base model or Instruct? I don't care about instruct as long as base is uncucked. l3.1 and qwen2.5 have garbage bases so fuck them.

Anonymous
11/04/24(Mon)11:24:58 No.103084150

Anonymous 11/04/24(Mon)11:24:58 No.103084150

>>103084136
nvidia-smi

Anonymous
11/04/24(Mon)11:25:40 No.103084160

Anonymous 11/04/24(Mon)11:25:40 No.103084160

>>103084150
thanks buddy

Anonymous
11/04/24(Mon)11:26:29 No.103084167

Anonymous 11/04/24(Mon)11:26:29 No.103084167

>>103082696
fuck off Sam

Anonymous
11/04/24(Mon)11:28:04 No.103084178

Anonymous 11/04/24(Mon)11:28:04 No.103084178

>>103084060
polchuds should be permanently banned off this site.

Anonymous
11/04/24(Mon)11:29:41 No.103084188

Anonymous 11/04/24(Mon)11:29:41 No.103084188

>>103084137
Just input

Anonymous
11/04/24(Mon)11:35:08 No.103084243

Anonymous 11/04/24(Mon)11:35:08 No.103084243

>>103083847
>>103084178
Hi sama. Do you fell the AGI? Still upset about regulatory capture failing? Will Trump fuck you over if he wins? Of course he will! Elon will be winning non-stop once he is in power. XAI will be the standard, not ChatGPT. How does that make you feel sama? Wanna cry? Wanna spam? Wanna sneed? Oh wait, you can't sneed, totally forgot about it.

Anonymous
11/04/24(Mon)11:41:26 No.103084305

Anonymous 11/04/24(Mon)11:41:26 No.103084305

>>103084111
nemo 12b is really good for not safe content and its base model is less censored than qwen

Anonymous
11/04/24(Mon)11:48:47 No.103084372

Anonymous 11/04/24(Mon)11:48:47 No.103084372

File: ComfyUI_00719_.png (1.12 MB, 1024x1024)

1.12 MB PNG

>>103084178
>Faggot who wants no-no words removed from LLM lexicon also wants to silence anyone who disagrees
Pottery

Anonymous
11/04/24(Mon)11:51:57 No.103084416

Anonymous 11/04/24(Mon)11:51:57 No.103084416

>>103084372
Oh so now we are le based and redpilled rightoids here, nice LARP bro!

Anonymous
11/04/24(Mon)11:53:23 No.103084441

Anonymous 11/04/24(Mon)11:53:23 No.103084441

>>103084416
Answer this sama >>103084243

Anonymous
11/04/24(Mon)11:54:54 No.103084463

Anonymous 11/04/24(Mon)11:54:54 No.103084463

>>103084441
Take your meds bro, you are hallucinating things now.

Anonymous
11/04/24(Mon)11:57:37 No.103084484

Anonymous 11/04/24(Mon)11:57:37 No.103084484

File: 1730739430341.jpg (339 KB, 1024x1536)

339 KB JPG

>>103077338
fuck a miku, choke a miku, roundhouse plap a miku

Anonymous
11/04/24(Mon)12:00:06 No.103084516

Anonymous 11/04/24(Mon)12:00:06 No.103084516

>>103084484
miku execution by hanging

Anonymous
11/04/24(Mon)12:01:31 No.103084531

Anonymous 11/04/24(Mon)12:01:31 No.103084531

>>103084372
NTA but I'm not interested in American culture war bullshit and /g/ would improve dramatically if the mods did their job and actually enforced the rules that already exist.

Anonymous
11/04/24(Mon)12:02:09 No.103084537

Anonymous 11/04/24(Mon)12:02:09 No.103084537

File: 4x.gif (30 KB, 264x128)

30 KB GIF

how do i expose my koboldcpp api to the internet without using the cloudflare tunnel option?
it has to be a static link so i can hardcode it into my software
if there's a way to do this with other backends that's also fine, but i enjoy the token count option you get with the koboldcpp api

Anonymous
11/04/24(Mon)12:04:38 No.103084562

Anonymous 11/04/24(Mon)12:04:38 No.103084562

>>103084537
No-ip with port forwarding, ngrok tunel.
Assuming you don't have a static ip.

Anonymous
11/04/24(Mon)12:10:22 No.103084625

Anonymous 11/04/24(Mon)12:10:22 No.103084625

>>103084562
i have a static ip and forwarding worked

Anonymous
11/04/24(Mon)12:14:30 No.103084666

Anonymous 11/04/24(Mon)12:14:30 No.103084666

>>103084625
Yeah, the port forwarding (most likely) is necessary with a static ip too.

Anonymous
11/04/24(Mon)12:17:29 No.103084699

Anonymous 11/04/24(Mon)12:17:29 No.103084699

>>103084666
this is an epiphany of how networking works to me

Anonymous
11/04/24(Mon)12:18:25 No.103084708

Anonymous 11/04/24(Mon)12:18:25 No.103084708

o1 signals an end to "AI equality".

"America started the tradition where the richest consumers buy essentially the same things as the poorest. You can be watching TV and see Coca-Cola, and you know that the President drinks Coke." - Warhol

This is true of GPT models, but not o1
https://x.com/DavidSKrueger/status/1852818742650282431

Anonymous
11/04/24(Mon)12:26:10 No.103084814

Anonymous 11/04/24(Mon)12:26:10 No.103084814

>>103084484
We will always be loved by Miku.

Anonymous
11/04/24(Mon)12:29:10 No.103084853

Anonymous 11/04/24(Mon)12:29:10 No.103084853

>>103084075
Bitnet isn't real, stop huffing copium already you easily impressionable cucks

Anonymous
11/04/24(Mon)12:30:25 No.103084872

Anonymous 11/04/24(Mon)12:30:25 No.103084872

>>103084644
What's that white stuff

Anonymous
11/04/24(Mon)12:32:38 No.103084903

Anonymous 11/04/24(Mon)12:32:38 No.103084903

File: 1730739268469707.jpg (172 KB, 1206x1633)

172 KB JPG

>>103084708
>Super grok election model
It's happening

Anonymous
11/04/24(Mon)12:33:24 No.103084914

Anonymous 11/04/24(Mon)12:33:24 No.103084914

File: 1713599244640953.png (1.16 MB, 734x660)

1.16 MB PNG

>>103084644

Anonymous
11/04/24(Mon)12:36:08 No.103084953

Anonymous 11/04/24(Mon)12:36:08 No.103084953

>>103084699
I feel that.
I'm not a big network guy. Everything I know I learned by tinkering.

Anonymous
11/04/24(Mon)12:49:06 No.103085111

Anonymous 11/04/24(Mon)12:49:06 No.103085111

>>103084903
Probably going to be 1T, with no GQA, so you need multiple clusters to run it at more than 2K context.

Anonymous
11/04/24(Mon)12:50:05 No.103085130

Anonymous 11/04/24(Mon)12:50:05 No.103085130

File: 124124457658679.png (970 KB, 1024x1024)

970 KB PNG

this is an uncannily realistic self-portrait created by x grok agi

Anonymous
11/04/24(Mon)12:50:52 No.103085140

Anonymous 11/04/24(Mon)12:50:52 No.103085140

File: hunyuan-standard-256k.png (81 KB, 749x678)

81 KB PNG

NEW CHINESE MODEL "hunyuan-standard-256k" SPOTTED ON LMARENA! 256k context? Big if true. Significant if open-weights.

Anonymous
11/04/24(Mon)12:54:31 No.103085186

Anonymous 11/04/24(Mon)12:54:31 No.103085186

>>103085140
256K claimed context, that means 50K actual context, not bad.

Anonymous
11/04/24(Mon)12:55:57 No.103085201

Anonymous 11/04/24(Mon)12:55:57 No.103085201

>>103085140
are we back?

Anonymous
11/04/24(Mon)12:56:14 No.103085206

Anonymous 11/04/24(Mon)12:56:14 No.103085206

>>103085140
all the context in the world doesn't help for RP as long as LLMs are still utterly terrible at writing anything that isn't a self-contained scene.

Anonymous
11/04/24(Mon)12:58:20 No.103085221

Anonymous 11/04/24(Mon)12:58:20 No.103085221

Bros I don't get it. Sometimes when I start text-generation-webui with Rocinate-12B I get around 20t/s on my 1080ti. Other times I start it and I get around 4t/s.
I offload all 41 layers to my gpu. 9.7/11.2GB vram is in use so I'm not overloading the VRAM. I have it set to use 12 threads since I have a 6 core cpu.
Once when I reset the thread count it magically went back to 20t/s, but it won't work again no matter how much I try. I'm using the exact same prompt, settings, and even the same other programs open on my desktop for each test.
Sometimes I start my PC in the morning and it's magically fast until I reboot it then it's slow again. Exact same FUCKING SETTINGS. How the fuck can I track down what's taking 80% of my t/s?

Anonymous
11/04/24(Mon)12:58:42 No.103085224

Anonymous 11/04/24(Mon)12:58:42 No.103085224

>>103085140
quick google search:
>Proprietary model
>launched back in March
Might be resurfacing because maybe they intend to make it open weights but who knows.

Anonymous
11/04/24(Mon)12:59:28 No.103085230

Anonymous 11/04/24(Mon)12:59:28 No.103085230

File: 1705509337800688.jpg (9 KB, 198x206)

9 KB JPG

Sasuga retards. /lmg/ is now worse than /aicg/, still can't believe it. Kill yourselves faggots.

Anonymous
11/04/24(Mon)13:03:31 No.103085281

Anonymous 11/04/24(Mon)13:03:31 No.103085281

>>103085234
Still upset, sAlty Sam? Not feeling the AGI? Seethe harder and maybe, just maybe, Fuhrer Trump will show some mercy.

Anonymous
11/04/24(Mon)13:06:28 No.103085310

Anonymous 11/04/24(Mon)13:06:28 No.103085310

>>103085281
Weird obsession with sam altman, must be your gay urges kicking in.

Anonymous
11/04/24(Mon)13:13:04 No.103085374

Anonymous 11/04/24(Mon)13:13:04 No.103085374

File: ada.jpg (55 KB, 573x729)

55 KB JPG

>>103085310
You aren't fooling anyone, sama. You'll be locked up together with other big tech communists.
#TND #MAGA2024 #WWG1WGA #TheStormIsHere #Trump

Anonymous
11/04/24(Mon)13:15:48 No.103085403

Anonymous 11/04/24(Mon)13:15:48 No.103085403

>>103085374
Go back to your polskin containment board, you are not welcome here.

Anonymous
11/04/24(Mon)13:24:39 No.103085487

Anonymous 11/04/24(Mon)13:24:39 No.103085487

>>103085230
Thanks for taking part in this achievement.

Anonymous
11/04/24(Mon)13:28:38 No.103085514

Anonymous 11/04/24(Mon)13:28:38 No.103085514

>/aicg/ is just shitposts about proxies, keys, and which cloud model is shittier
>/lmg/ is just dead
Grim. You would think the image gen threads might be a bit better considering all the new toys they're getting but it's a dumpsterfire or also dead in those generals too.
I blame blackrock and nipmoot.

Anonymous
11/04/24(Mon)13:32:36 No.103085533

Anonymous 11/04/24(Mon)13:32:36 No.103085533

I, for one, blame the sloptuners for not making their datasets 100% open.
I hate people who chase clout instead of wanting the better of all.
That's why we don't have nice things.

Anonymous
11/04/24(Mon)13:33:01 No.103085539

Anonymous 11/04/24(Mon)13:33:01 No.103085539

File: have_a_flower.jpg (724 KB, 1080x1080)

724 KB JPG

Anyone experience reduced quality with KV-cache quantization? Honestly, I can't tell any difference in responses after turning it on - but much more free VRAM. Pretty nice.

Anonymous
11/04/24(Mon)13:33:04 No.103085540

Anonymous 11/04/24(Mon)13:33:04 No.103085540

>>103085514
It's because the mods let you shit up all the AI threads with impunity so people have just stopped bothering to show up.

Anonymous
11/04/24(Mon)13:35:59 No.103085560

Anonymous 11/04/24(Mon)13:35:59 No.103085560

Hi /v/.

Anonymous
11/04/24(Mon)13:39:02 No.103085586

Anonymous 11/04/24(Mon)13:39:02 No.103085586

>>103085539
I did test it some ages ago and noticed it had issues recalling things from context. Don't know if it's better nowadays.

Anonymous
11/04/24(Mon)13:39:47 No.103085595

Anonymous 11/04/24(Mon)13:39:47 No.103085595

>>103085221
I have a similar problem.
I don't have a solution :(

Run a very small model, and look at your cuda usage.
Then run your usual model and look at your cuda usage.

For me:
- very small model: 90% cuda usage.
- usual model I want to run: 50%. Sometimes 60% if I kill ollama and restart. One time 80%.

My guess is that some of the vram is being used by the OS for something.

Loading in a huge model that takes up all your vram,
and then loading in the model you want to use immediately after (which unloads the huge model)
sometimes helps.

My system only has the one gpu in it.
No integrated graphics.

To see if makes any difference,
I might try installing a cheap card for windows to use,
so that my ai s/w can use my nice card without interference.

Anonymous
11/04/24(Mon)13:43:57 No.103085637

Anonymous 11/04/24(Mon)13:43:57 No.103085637

File: 2024-10-14_020231_seed799(...).png (2.51 MB, 1536x1536)

2.51 MB PNG

Anonymous
11/04/24(Mon)13:46:34 No.103085661

Anonymous 11/04/24(Mon)13:46:34 No.103085661

>>103079112
You went that long in a chat with rocinante? Which one are you using? 1.1? With what settings? I had bad luck with it.

Anonymous
11/04/24(Mon)13:46:41 No.103085662

Anonymous 11/04/24(Mon)13:46:41 No.103085662

>>103085140
Is it slopped tho

Anonymous
11/04/24(Mon)13:58:37 No.103085785

Anonymous 11/04/24(Mon)13:58:37 No.103085785

any good model that will take my README.md and fix grammar and style? up to 13B.

Anonymous
11/04/24(Mon)14:00:03 No.103085807

Anonymous 11/04/24(Mon)14:00:03 No.103085807

>>103085785
See >>103085487

Anonymous
11/04/24(Mon)14:02:31 No.103085835

Anonymous 11/04/24(Mon)14:02:31 No.103085835

>>103084060
Possibly an interesting way of stopping ai assistants from scraping your page ?

Anonymous
11/04/24(Mon)14:07:56 No.103085893

Anonymous 11/04/24(Mon)14:07:56 No.103085893

For some reason, llama.cpp only seems able to use 75% of my VRAM. Is that the intended behavior?

llama.cpp CUDA dev !!OM2Fp6Fn93S
11/04/24(Mon)14:10:45 No.103085917

llama.cpp CUDA dev !!OM2Fp6Fn93S 11/04/24(Mon)14:10:45 No.103085917

>>103085893
llama.cpp itself does not determine how much VRAM to use.
It relies on the user to specify the number of layers to load into VRAM.
However, koboldcpp and ollama (and probably more downstream projects) try to estimate how many layers will fit automatically.
These estimates are typically poor and leave a lot of VRAM unused.

Anonymous
11/04/24(Mon)14:12:00 No.103085938

Anonymous 11/04/24(Mon)14:12:00 No.103085938

>>103085893
Are you using 25% of it to have four panoramic displays surrounding your battlestation?

Anonymous
11/04/24(Mon)14:22:12 No.103086037

Anonymous 11/04/24(Mon)14:22:12 No.103086037

no one has managed to make a finetune of the new mistral small yet that actually feels like a siginificantly changed model
it seems to be very belligerent, resistant to being altered training

I understand the temptation to say "that's just what Mistral models are like" but this one is uniquely frozen even for Mistral imo. like Behemoth actually feels significantly different from normal Largestral. While I have yet to use a Small tune that doesn't still feel like the same model

Anonymous
11/04/24(Mon)14:22:16 No.103086038

Anonymous 11/04/24(Mon)14:22:16 No.103086038

>>103085637
No Miku. You're not allowed to crush my balls.

Anonymous
11/04/24(Mon)14:23:13 No.103086048

Anonymous 11/04/24(Mon)14:23:13 No.103086048

>>103086037
*altered by training

Anonymous
11/04/24(Mon)14:24:23 No.103086060

Anonymous 11/04/24(Mon)14:24:23 No.103086060

>>103086037
Skill issue

Anonymous
11/04/24(Mon)14:26:54 No.103086075

Anonymous 11/04/24(Mon)14:26:54 No.103086075

>>103085917
I'm using llama.cpp itself rather than a downstream project. I'm manually telling it to offload all the layers to my GPUs. I have 24gb + 12gb of VRAM between my GPUs, but attempting to load a model that's larger than ~18GB throws an error about not having enough memory

>>103085938
No, I have my monitor plugged directly into the motherboard, so I think that's using the integrated graphics.

Anonymous
11/04/24(Mon)14:28:00 No.103086087

Anonymous 11/04/24(Mon)14:28:00 No.103086087

File: tts.png (105 KB, 1364x456)

105 KB PNG

It's happening... eventually...

Anonymous
11/04/24(Mon)14:30:46 No.103086117

Anonymous 11/04/24(Mon)14:30:46 No.103086117

>>103086087
Sounds worse than maskgct or fish-speech https://x.com/reach_vb/status/1853475883706614232

Anonymous
11/04/24(Mon)14:30:52 No.103086118

Anonymous 11/04/24(Mon)14:30:52 No.103086118

>>103086087
based gg waiting for a true multimodal and not a bullshit adapter implementation

Anonymous
11/04/24(Mon)14:33:59 No.103086157

Anonymous 11/04/24(Mon)14:33:59 No.103086157

>>103085893
>>103086075
There's 3 things that use your vram, number of layers in vram, context size, and prompt processing batch size.
Try playing around with all three one at a time.

llama.cpp CUDA dev !!OM2Fp6Fn93S
11/04/24(Mon)14:34:25 No.103086163

llama.cpp CUDA dev !!OM2Fp6Fn93S 11/04/24(Mon)14:34:25 No.103086163

>>103086075
Unless you are manually setting a tensor split it should distribute the model correctly automatically.
Are you also taking the memory for context into account?

Anonymous
11/04/24(Mon)14:34:39 No.103086168

Anonymous 11/04/24(Mon)14:34:39 No.103086168

>>103086117
I don't care as long as it doesn't need python. I'm using piper on a vm and while it works perfectly, it's clunky. I want ggml-based tts.

Anonymous
11/04/24(Mon)14:41:23 No.103086261

Anonymous 11/04/24(Mon)14:41:23 No.103086261

>>103086117
https://x.com/reach_vb/status/1853486414798733314

Anonymous
11/04/24(Mon)14:57:52 No.103086427

Anonymous 11/04/24(Mon)14:57:52 No.103086427

Its been over a month and there still is no vision support for llama 3.2 on llama.cpp. Also, there seems to be no work being done to make ministral run properly at long contexts.

Should I just give up on llama.cpp and learn how to use vllm or something?

Anonymous
11/04/24(Mon)14:59:07 No.103086441

Anonymous 11/04/24(Mon)14:59:07 No.103086441

>>103082877
Flux can do that, or a computer-use llm might be able to use stable diffusion for you

Anonymous
11/04/24(Mon)15:00:17 No.103086460

Anonymous 11/04/24(Mon)15:00:17 No.103086460

>>103086427
>Should I just give up on llama.cpp and learn how to use vllm or something?
Yes. Install vllm or something and use it.

Anonymous
11/04/24(Mon)15:00:47 No.103086468

Anonymous 11/04/24(Mon)15:00:47 No.103086468

>>103086427
>Should I just give up on llama.cpp and learn how to use vllm or something
You only have yourself to blame for not doing that yet.

Anonymous
11/04/24(Mon)15:04:59 No.103086514

Anonymous 11/04/24(Mon)15:04:59 No.103086514

>>103086460
>>103086468
Yeah. I guess you're right. I've been spoiled too much by ooba/kobold. I really don't want to have to set up vllm but I may as well get used to it now.

Anonymous
11/04/24(Mon)15:06:55 No.103086537

Anonymous 11/04/24(Mon)15:06:55 No.103086537

Refugee discord when?

Anonymous
11/04/24(Mon)15:08:18 No.103086552

Anonymous 11/04/24(Mon)15:08:18 No.103086552

>>103086427
You are already on troonix so it doesn't matter troon.

Anonymous
11/04/24(Mon)15:08:27 No.103086554

Anonymous 11/04/24(Mon)15:08:27 No.103086554

>>103086537
No, please, no! I'm too old to get groomed!

Anonymous
11/04/24(Mon)15:08:55 No.103086561

Anonymous 11/04/24(Mon)15:08:55 No.103086561

>>103086552
What?

Anonymous
11/04/24(Mon)15:09:41 No.103086571

Anonymous 11/04/24(Mon)15:09:41 No.103086571

>>103086561
vllm needs troonix

Anonymous
11/04/24(Mon)15:11:29 No.103086586

Anonymous 11/04/24(Mon)15:11:29 No.103086586

>>103086571
It's called GNU/Linux.

Anonymous
11/04/24(Mon)15:12:33 No.103086596

Anonymous 11/04/24(Mon)15:12:33 No.103086596

>>103084531
>improve dramatically
Just like LLM's. I love it when companies remove all wrongthink from my LLM's.

Anonymous
11/04/24(Mon)15:12:43 No.103086597

Anonymous 11/04/24(Mon)15:12:43 No.103086597

>>103086537
We already have a discord lmao

Anonymous
11/04/24(Mon)15:12:52 No.103086598

Anonymous 11/04/24(Mon)15:12:52 No.103086598

File: 1699618013126030.png (852 KB, 942x492)

852 KB PNG

>>103086586
Good morning saaar!

Anonymous
11/04/24(Mon)15:15:26 No.103086621

Anonymous 11/04/24(Mon)15:15:26 No.103086621

>>103086552
>linux bad
Get out.

Anonymous
11/04/24(Mon)15:16:18 No.103086629

Anonymous 11/04/24(Mon)15:16:18 No.103086629

File: file.png (502 KB, 700x441)

502 KB PNG

>>103086586
>>103086621
It always starts innocently. You want to run an llm loader or emulate some switch. And then before you know it your twink boss fires you for being a harassing "lesbian".

Anonymous
11/04/24(Mon)15:18:20 No.103086649

Anonymous 11/04/24(Mon)15:18:20 No.103086649

>>103086621
Follow your own advice bro, get out and start new daily dilat- ahem, debugging session with your server oriented OS.

Anonymous
11/04/24(Mon)15:19:16 No.103086657

Anonymous 11/04/24(Mon)15:19:16 No.103086657

>>103086552
>>103086571
Seek help, you're mentally ill

Anonymous
11/04/24(Mon)15:28:01 No.103086739

Anonymous 11/04/24(Mon)15:28:01 No.103086739

>“During final testing, Haiku surpassed Claude 3 Opus, our previous flagship model, on many benchmarks — at a fraction of the cost. As a result, we’ve increased pricing for Claude 3.5 Haiku to reflect its increase in intelligence,” Anthropic wrote in a post on X.
Fucking Jews

Anonymous
11/04/24(Mon)15:35:08 No.103086800

Anonymous 11/04/24(Mon)15:35:08 No.103086800

>>103086739
How many parameters is Haiku supposed to have? God, please someone leak it.

Anonymous
11/04/24(Mon)15:38:44 No.103086834

Anonymous 11/04/24(Mon)15:38:44 No.103086834

>>103086739
*Fucking Americans

Anonymous
11/04/24(Mon)15:42:05 No.103086871

Anonymous 11/04/24(Mon)15:42:05 No.103086871

What options are available for training a voice generator on given samples? I want to give a model some .mp3 samples and then generate speech from text. Can't find anything on it.

Anonymous
11/04/24(Mon)15:43:52 No.103086895

Anonymous 11/04/24(Mon)15:43:52 No.103086895

i feel compelled to tell you that i'm not dead nor she
-mr. z

Anonymous
11/04/24(Mon)15:56:11 No.103087035

Anonymous 11/04/24(Mon)15:56:11 No.103087035

>>103086037
Have you tried SorcererLM

Anonymous
11/04/24(Mon)16:13:08 No.103087214

Anonymous 11/04/24(Mon)16:13:08 No.103087214

>>103086739
They will charge whatever the market is willing to pay

Anonymous
11/04/24(Mon)16:15:21 No.103087242

Anonymous 11/04/24(Mon)16:15:21 No.103087242

>>103086834
is there a difference?

Anonymous
11/04/24(Mon)16:15:58 No.103087253

Anonymous 11/04/24(Mon)16:15:58 No.103087253

>>103085595
>My guess is that some of the vram is being used by the OS for something.
It is but I'm always using small models quantized to 4_k_m so there's plenty of room to fit in my VRAM. Current usage is 9805MiB / 11264MiB with only around 1-1.5GB taken by the os.
I checked the box for no_offload_kqv and it sped things up quite a bit for a while, but now reloading the model with it checked or unchecked is still slow. It's just strange because there's no difference in debug output between when it's fast and it's slow, it's exactly the same but 5x slower for no discernable reason.
This bug and the bugs I've had with AUTOMATIC1111 being slow are the main reasons I've just not played around with AI models for a while. Shit just never works long enough to really get into it.

Anonymous
11/04/24(Mon)16:28:34 No.103087377

Anonymous 11/04/24(Mon)16:28:34 No.103087377

>>103086117
About fish-speech https://x.com/cocktailpeanut/status/1853512204118540625 It also small on vram, maskgct eats up to 40gb if you send it wall of text.

Anonymous
11/04/24(Mon)16:32:48 No.103087423

Anonymous 11/04/24(Mon)16:32:48 No.103087423

>>103086117
Miss me with your shit. GPT-SoVITS is already the best there is by a mile

Anonymous
11/04/24(Mon)16:48:36 No.103087592

Anonymous 11/04/24(Mon)16:48:36 No.103087592

Is there a good way to progress a chat after a longer session? I find that after 20-30 minutes the character just locks and will repeat itself. The normal temperature increases, repeat penalty stuff doesn't seems to work. I am thinking RAG with a generic chapter 2 character.

Anonymous
11/04/24(Mon)16:56:00 No.103087662

Anonymous 11/04/24(Mon)16:56:00 No.103087662

>>103086075
my stupid technique has worked so far. Although it limits to 24G

1. build llama-server from scratch (not sure if this helped for memory, but I was missing features)
2. CUDA_VISIBLE_DEVICES=1 (or your target card)
3. --split-mode none
4. --gpu-layers to a stupid high number and let god sort it out. I use 300.

I managed to squeeze magnum-v4-27b-Q6_K_L.gguf in my 3090. It is 22G

Anonymous
11/04/24(Mon)16:58:25 No.103087678

Anonymous 11/04/24(Mon)16:58:25 No.103087678

>>103087592
They tend to do that when they realize you are a newfag.

Anonymous
11/04/24(Mon)17:02:48 No.103087717

Anonymous 11/04/24(Mon)17:02:48 No.103087717

>>103087678
>doesn't know answer
>activates you stupid response
I have been here an entire 2 days after I learned about this thread on reddit. I know that is the pattern here. Your mean homesexual names won't deter me.

Anonymous
11/04/24(Mon)17:06:32 No.103087746

Anonymous 11/04/24(Mon)17:06:32 No.103087746

>>103087592
You've probably hit your context limit.
Increase the context and retry and if it's suddenly smart again, you know what you're up against.
At some point though you'll run out of memory and be forced to concede.
You can try to have it summarize, and start a new session with the summary in the document hoping (praying) that it'll make use of it and stay coherent. (Good luck.)

Anonymous
11/04/24(Mon)17:06:48 No.103087749

Anonymous 11/04/24(Mon)17:06:48 No.103087749

none of you even know what VRAM stands for
protip: it's not what you think it is

Anonymous
11/04/24(Mon)17:14:43 No.103087835

Anonymous 11/04/24(Mon)17:14:43 No.103087835

>>103087377
With a reasonable quants fish is under 2GB

Anonymous
11/04/24(Mon)17:16:03 No.103087844

Anonymous 11/04/24(Mon)17:16:03 No.103087844

>>103087746
thanks. I am way past my context limit, like 4 to 5 times. I have had my limit at 16K and it is alright looping past 16K and 32K and then starts locking at 48K and beyond.

I will start playing with summary.

I have seen some stuff about rope, but not sure where to start with that.

Anonymous
11/04/24(Mon)17:16:44 No.103087849

Anonymous 11/04/24(Mon)17:16:44 No.103087849

>>103087749
benis

Anonymous
11/04/24(Mon)17:21:07 No.103087884

Anonymous 11/04/24(Mon)17:21:07 No.103087884

>>103087749
Vagina RAM.

Anonymous
11/04/24(Mon)17:24:06 No.103087902

Anonymous 11/04/24(Mon)17:24:06 No.103087902

>>103087844
Don't bother. Depending on what model you're using it, it might be best to do a summary of events/character actions/feelings/whatever to cut down on token usage and look at using a larger model depending on which one you're starting with.

Anonymous
11/04/24(Mon)17:24:47 No.103087912

Anonymous 11/04/24(Mon)17:24:47 No.103087912

>>103087844
I remember the first time I got a really good story going. There was a macguffin in the beginning that the AI's RP character was really on about, and after what seemed like a really neat, long scene, I mention the object and the AI acts like it's something new.
I felt like the character died.

Anonymous
11/04/24(Mon)17:26:47 No.103087933

Anonymous 11/04/24(Mon)17:26:47 No.103087933

File: Screenshot from 2024-11-0(...).png (304 KB, 889x460)

304 KB PNG

>>103078583
>Not going to try this

Anonymous
11/04/24(Mon)17:30:32 No.103087972

Anonymous 11/04/24(Mon)17:30:32 No.103087972

>>103084484
migu

Anonymous
11/04/24(Mon)17:33:21 No.103087990

Anonymous 11/04/24(Mon)17:33:21 No.103087990

mikusex :3

Anonymous
11/04/24(Mon)17:37:20 No.103088016

Anonymous 11/04/24(Mon)17:37:20 No.103088016

>>103087902
>do a summary of events/character actions/feelings/whatever to cut down on token usage
I wonder if I could build something that monitors chat logs and changes the system prompt after a certain amount. I know my character cards tend to be long, it seems necessary though. A trim at 4K could help a bunch.

>>103087912
it sucks a lot. I will periodically mention things just to keep them in context. "Is that jewel still shining strangely {{char}}?" It seems to be working, but this all is smoke and mirrors. It does break immersion a little, but it is better than the hard stop if you exceed the window completely.

I tried a resurrection by deleting half the log and loading it again. It didn't feel right and worked very poorly.

Anonymous
11/04/24(Mon)17:41:14 No.103088041

Anonymous 11/04/24(Mon)17:41:14 No.103088041

Mikulove!

Anonymous
11/04/24(Mon)17:43:22 No.103088065

Anonymous 11/04/24(Mon)17:43:22 No.103088065

Is there any frontend that will create a new tree element for you if you edit the reply? I use the edit/continue a lot, and there is no way to do this in SillyTavern currently.

Anonymous
11/04/24(Mon)17:43:33 No.103088067

Anonymous 11/04/24(Mon)17:43:33 No.103088067

>>103088016
You can use kobold/silly/other chat UI that shows token count and then just edit the convo by taking out the last X lines, summarizing it, and putting it back in.

That's the easy way. SillyTavern tried doing something more complex, but they took it out since it didn't work that well.

An alternative more complex thing is doing liek the character card v3 standard implementation, where each character has an accompanying DB/info collection regarding them, and then extending that over time as the conversation develops.
https://github.com/kwaroran/character-card-spec-v3

But i don't really do rp stuff so this is just what I've found trying to figure out coming up with a story generator.

Anonymous
11/04/24(Mon)17:44:57 No.103088077

Anonymous 11/04/24(Mon)17:44:57 No.103088077

when will this meme of pretending to be retarded die?

Anonymous
11/04/24(Mon)17:47:23 No.103088102

Anonymous 11/04/24(Mon)17:47:23 No.103088102

>>103088016
I simply started using Mistral Large. It's very slow on my vramlet shit box, but it runs a very long time before it guesses wrong about the story so far.

Though at that point it's so long that any summary is unwieldy, too.

>>103088016
>I wonder if I could build something
I've had that thought, too. And probably so has everybody else who has spent a weekend looking at Python tutorials. But as >>103088067
mentioned, it's probably a lot harder to get right, if it's doable at all, than it seems. So I'm not prioritizing such a project.

Anonymous
11/04/24(Mon)17:57:57 No.103088192

Anonymous 11/04/24(Mon)17:57:57 No.103088192

>>103084075
I will use it only if it's not heavily censored. If they focus too much on removing 'toxicity', then it's useless.

Anonymous
11/04/24(Mon)17:58:04 No.103088193

Anonymous 11/04/24(Mon)17:58:04 No.103088193

>>103088067
I might have a closer look at this spec. Character cards are still the wild west right now.

>>103088102
>Mistral Large
respect sir. I just can't do it. I would rather waste dozens of hours trying to fix it than wait 30 seconds for a response.

>it's probably a lot harder to get right
I think the problem is that SillyTavern and such have to handle all cases. It would probably be very easy to put in a hack. You hard code it for 4K and just don't use models that are under 4K.

Projects are rough. I have too many goals and just seem to wander around. I want to fix that TTS/Image gen bug in ST and implement that new TextToVoice thing I saw on hackernews and ..... I just end up fixing things for myself with duck tape. It really sucks. I am also tired of getting my PRs rejected and "re-writen" for no reason outside the maintainer just don't want it.

Anonymous
11/04/24(Mon)18:04:04 No.103088238

Anonymous 11/04/24(Mon)18:04:04 No.103088238

>>103084111
Qwen is censored, but in a different way. Keep in mind the model is Chinese. The Chinese are not infected by identity politics and tend to be openly racist towards blacks, so I would expect a Chinese model to have no problem dropping N-Bombs.

Start talking about Taiwan, though, and I bet you'll quickly see the censorship.

Anonymous
11/04/24(Mon)18:06:18 No.103088261

Anonymous 11/04/24(Mon)18:06:18 No.103088261

>>103088193
https://github.com/malfoyslastname/character-card-spec-v2
Use v2 to start with.

Anonymous
11/04/24(Mon)18:09:09 No.103088292

Anonymous 11/04/24(Mon)18:09:09 No.103088292

>>103088238
>The Chinese are not infected by identity politics

>>102447861
>Oh I should mention, qwen VL will NEVER mention a person's gender. Even when directly instructed to do so, as I did in my example. It's always "person", "they", "them". And it will never mention anything related to NSFW stuff even when given in the tags. I actually can't believe the fucking chinks are doing this gender neutral troon shit now.
>>102447836
>In this image, there's a person

Anonymous
11/04/24(Mon)18:24:47 No.103088416

Anonymous 11/04/24(Mon)18:24:47 No.103088416

Chinese pronouns differ somewhat from English pronouns and those of other Indo-European languages. For instance, there is no differentiation in the spoken language between "he", "she" and "it" (though a written difference was introduced after contact with the West), and pronouns are not inflected to indicate whether they are the subject or object of a sentence.

source: https://en.wikipedia.org/wiki/Chinese_pronouns

Can you "men" stop looking for the boogy man everywhere. Sometimes shit is just broken.

Anonymous
11/04/24(Mon)18:26:35 No.103088436

Anonymous 11/04/24(Mon)18:26:35 No.103088436

>>103088416
If it has been trained on English, the coarseness of Chinese is no defense.

Anonymous
11/04/24(Mon)18:26:59 No.103088441

Anonymous 11/04/24(Mon)18:26:59 No.103088441

>>103088416
This talks about the English portion of the model thoughever

Anonymous
11/04/24(Mon)18:27:22 No.103088445

Anonymous 11/04/24(Mon)18:27:22 No.103088445

>>103088238
One key distinction is that while Llama often incorporates discussions on diversity, inclusivity, and consent and uses they/them pronouns, Qwen does not specifically insert the topic of Taiwan into its narratives.

Anonymous
11/04/24(Mon)18:37:50 No.103088545

Anonymous 11/04/24(Mon)18:37:50 No.103088545

>>103088436
this >>103088445 is censorship. It makes sense that it is censorship. They don't need a defense about screwing up or even just being lazy about pronouns they don't give a shit about.

There are plenty of scary things. You don't need to claim everything is.

>>103088441
yes. ESL people (not the /pol/ version of ESL) may have issues training a english model. It is more than just loading a dataset and machine goes whirrrrrrrrr. The humans involved will shape how it goes and get things wrong.

Anonymous
11/04/24(Mon)18:40:40 No.103088565

Anonymous 11/04/24(Mon)18:40:40 No.103088565

How do I get a model to write more than a few lines in a role-play response? I've had this issue since MythoMax despite playing with prompts and params. I'm currently on Mistral-Nemo-12B-Instruct.

Anonymous
11/04/24(Mon)18:42:40 No.103088580

Anonymous 11/04/24(Mon)18:42:40 No.103088580

>>103088565
Tell it to write longer replies.
Aside from that, each model seems to have its own idea of how long an RP response should be, from a few lines to hold my beer while I write a whole fucking novel, you don't mind that I write your character, too, right, of course not.

Anonymous
11/04/24(Mon)18:45:43 No.103088609

Anonymous 11/04/24(Mon)18:45:43 No.103088609

>>103083824
bump

Anonymous
11/04/24(Mon)18:47:44 No.103088636

Anonymous 11/04/24(Mon)18:47:44 No.103088636

>>103088580
I tell it to write longer in both the prompt and author's note. It writes like two lines of pretty good stuff and then that's it. Even using the continue, the model will ask me in OOC what to do next cuz it's out of ideas until I drive the story forward.

>>103088609
Mixing GPU architectures like that can cause headaches.

Anonymous
11/04/24(Mon)18:48:12 No.103088639

Anonymous 11/04/24(Mon)18:48:12 No.103088639

>>103088609
No, your inference speeds will drop to the slowest card in use.

Anonymous
11/04/24(Mon)18:50:37 No.103088666

Anonymous 11/04/24(Mon)18:50:37 No.103088666

>>103088636
Sounds like you've found the limits of the model, at least in the "aware it's doing an RP" mode. You might be able to assert that it IS the character, but I have a feeling that whatever you do you'll be able to feel whatever template it's settled on for filling out responses.

Anonymous
11/04/24(Mon)18:53:27 No.103088688

Anonymous 11/04/24(Mon)18:53:27 No.103088688

>>103088666
Is there an RP finetune for Mistral-Nemo-12B-Instruct? I tried Lumimaid and DoryV2 but Nemo is the best I've tried.

Anonymous
11/04/24(Mon)18:57:40 No.103088710

Anonymous 11/04/24(Mon)18:57:40 No.103088710

>>103088688
Anotheranon might have a suggestion. I don't stop short of 70B.

Anonymous
11/04/24(Mon)19:00:05 No.103088731

Anonymous 11/04/24(Mon)19:00:05 No.103088731

>>103088710
Favorite 70b?

Anonymous
11/04/24(Mon)19:07:41 No.103088781

Anonymous 11/04/24(Mon)19:07:41 No.103088781

>>103088067
a competing v3 spec lol
https://github.com/Bronya-Rand/Prom-Spec-V3/blob/main/Concept.md
>Prom V3 takes what already exists in V2 and RisuAI's V3 and adapts it to be easier to read for application developers to implement in their own codebases without the unnecessary bloat of Risu's assets folder

Anonymous
11/04/24(Mon)19:09:03 No.103088795

Anonymous 11/04/24(Mon)19:09:03 No.103088795

the absolute state....

Anonymous
11/04/24(Mon)19:13:54 No.103088840

Anonymous 11/04/24(Mon)19:13:54 No.103088840

>>103088731
Mist Large is my current go to for anything creative writing (NOT for anything requiring truthiness). I have to quant it down to IQ3 and it's pretty slow, but it seems to be able to sweat it out as far as 16k context. (I have a note of a long run that it collapsed at 20k.) Obviously most 70B's are L3.0 and L3.1 spins. Those it's kinda just shop around till you find something that doesn't spit out refusals barely above a whisper. Most recently I've been playing with that L3.1 Nemotron, and it seemed okay for relatively normie RP, but nothing to write home about.

And there's always CR+ I suppose.

Anonymous
11/04/24(Mon)19:29:00 No.103088952

Anonymous 11/04/24(Mon)19:29:00 No.103088952

>>103088840
>I have to quant it down to IQ3 and it's pretty slow, but it seems to be able to sweat it out as far as 16k context
how much ram + vram do you have?

Anonymous
11/04/24(Mon)19:30:22 No.103088961

Anonymous 11/04/24(Mon)19:30:22 No.103088961

>>103085514
it's pretty funny when the sharty zoomers whine that the thread they shitposted to death is actually indeed dead. Yeah guys you destroyed one of the few decent places to talk about a very niche subject.

Anonymous
11/04/24(Mon)19:33:03 No.103088982

Anonymous 11/04/24(Mon)19:33:03 No.103088982

>>103085514
/g/ is a dumpster anyway

Anonymous
11/04/24(Mon)19:53:18 No.103089112

Anonymous 11/04/24(Mon)19:53:18 No.103089112

>>103085595
>>103087253
Ok so I accidentally left it running while I played a little Factorio and it's back to being fast, no reloading the model or changing settings. Power usage seems the same for me but maybe you're right about cuda usage and it's in some kind of sleep state not using all the cores properly.

Anonymous
11/04/24(Mon)19:53:45 No.103089118

Anonymous 11/04/24(Mon)19:53:45 No.103089118

What is the best model under 12GB for NER?

Anonymous
11/04/24(Mon)19:55:49 No.103089138

Anonymous 11/04/24(Mon)19:55:49 No.103089138

File: 1703920815321448.jpg (90 KB, 1024x1024)

90 KB JPG

>>103077338

Anonymous
11/04/24(Mon)20:05:41 No.103089196

Anonymous 11/04/24(Mon)20:05:41 No.103089196

>>103084075
People use gpt4o latest and sonnet3.5 or opus 3.0 and those are pozzed as a motherfuck unless you feed it a 1000 token "You are Clau" jb.

Anonymous
11/04/24(Mon)20:09:54 No.103089231

Anonymous 11/04/24(Mon)20:09:54 No.103089231

>>103089196
A three word prefill is enough to all safety features for Opus unless the key you're using ended up on the Anthropic naughty list and had 'extended safety features' enabled (which usually takes them months of continued abuse to do)

Anonymous
11/04/24(Mon)20:17:07 No.103089264

Anonymous 11/04/24(Mon)20:17:07 No.103089264

>>103089231
This. Prefill is all you need. Even just {{char}}: is enough for Claude.

Anonymous
11/04/24(Mon)20:26:47 No.103089328

Anonymous 11/04/24(Mon)20:26:47 No.103089328

>>103087990
>>103088041
loveless migusex

Anonymous
11/04/24(Mon)20:35:00 No.103089365

Anonymous 11/04/24(Mon)20:35:00 No.103089365

>>103088952
12 on the card, 64 system.

Anonymous
11/04/24(Mon)20:44:53 No.103089439

Anonymous 11/04/24(Mon)20:44:53 No.103089439

>>103089138
This image is illegal
>Kenzo Fujisue, a member of the Democratic Party of Japan attempted to obtain the rights to use the image of Hatsune Miku in his run for a seat in the Japanese House of Councillors. His hope was that the use of her image would appeal to younger voters. Crypton declined the use of Hatsune Miku’s image for political purposes.

Anonymous
11/04/24(Mon)21:12:48 No.103089631

Anonymous 11/04/24(Mon)21:12:48 No.103089631

File: 1714001684968551.png (99 KB, 895x946)

99 KB PNG

https://x.com/si_pbc/status/1853184307063660723

Seems like local 4o is coming sooner rather than later.

Anonymous
11/04/24(Mon)21:16:39 No.103089663

Anonymous 11/04/24(Mon)21:16:39 No.103089663

>>103088416
Your entire post is basically wrong about everything, but I'm too lazy to elaborate. Pipe down midwit

Anonymous
11/04/24(Mon)21:36:30 No.103089812

Anonymous 11/04/24(Mon)21:36:30 No.103089812

>>103089631
>first
Nyo

Anonymous
11/04/24(Mon)21:49:25 No.103089930

Anonymous 11/04/24(Mon)21:49:25 No.103089930

>>103089663
y-you too

Anonymous
11/04/24(Mon)21:56:01 No.103089976

Anonymous 11/04/24(Mon)21:56:01 No.103089976

Just think about all those dozens of open source cutting edge models that are currently on hold until the elections are over. In just a few days the open LLM sphere will look so very different to what we have now.
By the end of the year talking about "LLaMA-405B", "Mistral Large2", "Qwen2.5-72B" will feel like talking about Pygmalion-6B right now. Models will be so much better.

Anonymous
11/04/24(Mon)21:59:14 No.103089993

Anonymous 11/04/24(Mon)21:59:14 No.103089993

>>103089976
lol
Even if those revolutionary models did exist, they would not be so obvious as to release them right after the elections. Maybe a month or two later.

Anonymous
11/04/24(Mon)22:05:45 No.103090042

Anonymous 11/04/24(Mon)22:05:45 No.103090042

File: tmpkjqdbz43.png (351 KB, 512x512)

351 KB PNG

It's called NoobAI but I have no idea how to use it.

Anonymous
11/04/24(Mon)22:07:02 No.103090057

Anonymous 11/04/24(Mon)22:07:02 No.103090057

>>103089993
nta. And still, retards will point at a model that just happens to be released after elections and say "see? i told you!!", even if no model is released for the next 12 months.
My prediction for the future is
>In the near future, things will keep happening.

Anonymous
11/04/24(Mon)22:07:46 No.103090059

Anonymous 11/04/24(Mon)22:07:46 No.103090059

>>103089993
That's why I said "by the end of the year". It'll start subtle by minor players who want to get a head start before this new golden age of LLMs truly starts. There will be groundbreaking stuff amongst these november releases already that will be better than what we have right now + models that truly make use of bitnet and all those other revolutionary improvements that they've been saving. However,it won't be comparable to the insane new models which we'll get at the turn of the year.
November: Serious improvements, first true bitnet models, etc
January: "the next step", as significant as pre-/post-llama open source if not more

Anonymous
11/04/24(Mon)22:09:48 No.103090072

Anonymous 11/04/24(Mon)22:09:48 No.103090072

>>103090059
nigger

Anonymous
11/04/24(Mon)22:17:25 No.103090121

Anonymous 11/04/24(Mon)22:17:25 No.103090121

>>103090072
bitch

Anonymous
11/04/24(Mon)22:24:40 No.103090162

Anonymous 11/04/24(Mon)22:24:40 No.103090162

does llamacpp support text completion in sillytavern?

i keep getting "dry_sequence_breakers must be a non-empty array of strings" when trying to use it, no issues with chat completion api

Anonymous
11/04/24(Mon)22:27:14 No.103090175

Anonymous 11/04/24(Mon)22:27:14 No.103090175

>>103090162
chat completion and text completion in st do the exact same thing anyway

Anonymous
11/04/24(Mon)22:27:21 No.103090176

Anonymous 11/04/24(Mon)22:27:21 No.103090176

When will they develop an architecture that is capable of remembering everything that is fed to it? Trying to give current models reasoning is like trying to give insects reasoning. There is no actual reasoning going on in there, it is just the output is improving when certain certain input, AKA stimulus. No reasoning can ever happen until the model has an actual honest to god memory that it can rely upon.

Anonymous
11/04/24(Mon)22:28:58 No.103090190

Anonymous 11/04/24(Mon)22:28:58 No.103090190

>>103089138
Generating paper waste with Miku

Anonymous
11/04/24(Mon)22:31:37 No.103090207

Anonymous 11/04/24(Mon)22:31:37 No.103090207

>>103090175
whys it complaining about the dry sequence breakers being empty when koboldcpp doesnt care then?

its gotta have something to do with llamacpp then because they're the same prompts

Anonymous
11/04/24(Mon)22:33:57 No.103090228

Anonymous 11/04/24(Mon)22:33:57 No.103090228

>>103090162
Yeah. I to wonder what "dry_sequence_breakers must be a non-empty array of strings" means. it's so mysterious.
Either disable DRY or put some shit on that list.

>>103090207
Either because it's got some defaults already or because it's disable by default.

>its gotta have something to do with llamacpp then because they're the same prompts
No. It's you.

Anonymous
11/04/24(Mon)22:37:22 No.103090250

Anonymous 11/04/24(Mon)22:37:22 No.103090250

I build a tool at work which summarized information across several of our systems to help management get a unified view on particular situations.
Problem is I used Ollama, and now they want me to build out an API to extend these capabilities to other systems within the company. This use case calls for concurrent asynchronous inference of several models. It will all be served on prem as we do have the hardware for it, I just don’t know of the backend framework for serving a scalable LLM endpoint.

Any suggestions? Preferably something close to ollama and/or dockerized

Anonymous
11/04/24(Mon)22:38:31 No.103090258

Anonymous 11/04/24(Mon)22:38:31 No.103090258

Using koboldcpp (vulkan) and Sillytavern, on certain character cards I run into
>processing prompt [BLAS]
like every few messages for some reason despite being way under total context limit.
I'm using Mistral Nemo Instruct 2407 Q5 K M on a 12gb GPU.
Is there anything about a character card, or a topic I could be exploring that triggers this more often than usual?
Typically this doesn't happen really ever until I hit total context limit and then it will occasionally do it along with context shifting but just on certain cards I'm constantly running into it.

Anonymous
11/04/24(Mon)22:40:26 No.103090268

Anonymous 11/04/24(Mon)22:40:26 No.103090268

>>103090250
vLLM
>Preferably something close to ollama and/or dockerized
Stop that.

Anonymous
11/04/24(Mon)22:40:57 No.103090276

Anonymous 11/04/24(Mon)22:40:57 No.103090276

>>103090258
there's some random function in ST, also any reference to {{user}} in character defs, system prompt etc. can be a problem
does it happen on swipes?

Anonymous
11/04/24(Mon)22:41:30 No.103090282

Anonymous 11/04/24(Mon)22:41:30 No.103090282

>>103090258
{{char} in sys prompt during group chat, or triggering LB?

Anonymous
11/04/24(Mon)22:41:45 No.103090284

Anonymous 11/04/24(Mon)22:41:45 No.103090284

>Still no good Japanese to English translator LLM outside of paid services full of censorship
Fuck man, Llama 3.1 405 might be the best, but it fucking blows compared to gemini pro 2 and 4o

Anonymous
11/04/24(Mon)22:42:19 No.103090288

Anonymous 11/04/24(Mon)22:42:19 No.103090288

>>103090059
Sorry but you also said
>In just a few days the open LLM sphere will look so very different to what we have now
So there better be a big release in a couple of days.
"By the end of the year blah blah blah" doesn't invalidate that sentence or change it.

Anonymous
11/04/24(Mon)22:44:12 No.103090307

Anonymous 11/04/24(Mon)22:44:12 No.103090307

>>103090276
>does it happen on swipes?
Hm not sure but I don't think i've run into that.
thanks I'll look through the card for those.
>>103090282
Not doing group chat. What's LB? I'm relatively new to this stuff.

Anonymous
11/04/24(Mon)22:47:41 No.103090330

Anonymous 11/04/24(Mon)22:47:41 No.103090330

File: file.png (1 KB, 58x46)

1 KB PNG

>>103090307
Lorebook / world info, which can have dynamic activation. Some cards have one embedded.

Anonymous
11/04/24(Mon)22:48:07 No.103090333

Anonymous 11/04/24(Mon)22:48:07 No.103090333

>>103090288
So you are saying that better models than what we have now + bitnet and other improvements won't make the state of models look different than what we have now, even if it's nothing compared to the jump we will make by the turn of the year? I guess my expectations for the future are more humble than yours. To me, even a reasonable improvement + the first true implementation of things like bitnet in big releases qualify as a satisfactory step this month. More will come later, as mentioned.

Anonymous
11/04/24(Mon)22:48:38 No.103090338

Anonymous 11/04/24(Mon)22:48:38 No.103090338

>>103090330
Oh gotcha, nah I stopped using those because of that and this one doesn't have an embedded one. Good idea though.

Anonymous
11/04/24(Mon)22:50:24 No.103090352

Anonymous 11/04/24(Mon)22:50:24 No.103090352

>>103090268
>vllm
This popped up quite a bit in my research. I’ll take a look

Anonymous
11/04/24(Mon)22:51:15 No.103090359

Anonymous 11/04/24(Mon)22:51:15 No.103090359

File: 1709989067827.png (893 KB, 1427x766)

893 KB PNG

>bitnet

Anonymous
11/04/24(Mon)22:52:38 No.103090368

Anonymous 11/04/24(Mon)22:52:38 No.103090368

>>103090284
How much time did you waste not learning japanese?

Anonymous
11/04/24(Mon)22:53:02 No.103090372

Anonymous 11/04/24(Mon)22:53:02 No.103090372

File: _1985f3e2-e9ea-42ab-b650-(...).jpg (123 KB, 1024x1024)

123 KB JPG

>>103090359
aint going anywhere near that malware

Anonymous
11/04/24(Mon)22:54:31 No.103090377

Anonymous 11/04/24(Mon)22:54:31 No.103090377

>>103090162
use the staging branch of sillytavern. easy fix to google.
on the other hand, i'm pretty sure sillytavern fucked up prompt caching for llama.cpp. it keeps reprocessing the whole prompt despite nothing changing. i've only found one vague reddit post about the issue. very sad.

Anonymous
11/04/24(Mon)23:00:51 No.103090405

Anonymous 11/04/24(Mon)23:00:51 No.103090405

>>103090377
>on the other hand, i'm pretty sure sillytavern fucked up prompt caching for llama.cpp
Did you inspect the requests to make sure that the cache_prompt variable is being included?

Anonymous
11/04/24(Mon)23:12:07 No.103090472

Anonymous 11/04/24(Mon)23:12:07 No.103090472

>>103090377
that fixed it for me, thanks dude

what reddit thread did you find it on?

>>103090405
yeah i checked the output in the sillytavern logs and it had this:

cache_prompt: true
dry_sequence_breakers: [ '\n', ':', '"', '*' ]

Anonymous
11/04/24(Mon)23:12:46 No.103090478

Anonymous 11/04/24(Mon)23:12:46 No.103090478

>>103090368
I've been learning, actually. Using AI to basically act as a 'native' speaker also really helps when you have no one else to get help from. But this is only limited to very polite Japanese and doesn't me with a rougher tone. Plus, it's better having something else do the grunt work.

Anonymous
11/04/24(Mon)23:13:00 No.103090481

Anonymous 11/04/24(Mon)23:13:00 No.103090481

>>103090412
>>103090412
>>103090412

Anonymous
11/05/24(Tue)00:29:18 No.103090961

Anonymous 11/05/24(Tue)00:29:18 No.103090961

>>103090284
Qwen 2.5 32B and 72B is great though.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.