/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

New anti-spam measures have been applied to all boards.

Please see the Frequently Asked Questions page for details.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 10/27/24(Sun)22:00:09 No.102998171

File: 1724210339350757.png (759 KB, 512x768)

759 KB PNG

/lmg/ - Local Models General Anonymous 10/27/24(Sun)22:00:09 No.102998171

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102987959 & >>102976869

►News
>(10/25) GLM-4-Voice: End-to-end speech and text model based on GLM-4-9B: https://hf.co/THUDM/glm-4-voice-9b
>(10/24) Aya Expanse released with 23 supported languages: https://hf.co/CohereForAI/aya-expanse-32b
>(10/22) genmoai-smol allows video inference on 24 GB RAM: https://github.com/victorchall/genmoai-smol
>(10/22) Mochi-1: 10B Asymmetric Diffusion Transformer text-to-video model: https://hf.co/genmo/mochi-1-preview
>(10/22) Pangea: Open-source multilingual multimodal LLM supporting 39 languages: https://neulab.github.io/Pangea

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
10/27/24(Sun)22:00:34 No.102998176

Anonymous 10/27/24(Sun)22:00:34 No.102998176

File: __hatsune_miku_and_chibi_(...).jpg (200 KB, 650x650)

200 KB JPG

►Recent Highlights from the Previous Thread: >>102987959

--Multiple perspectives feature reduces bias scores across social dimensions:
>102987976 >102988012 >102990681 >102988100 >102988132 >102988543 >102989070
--LLM RAM Calculator helps estimate memory requirements:
>102995865
--Introduction to LLM sampling:
>102990514 >102990633
--Transformers.js v3 supports WebGPU and ONNX runtime models:
>102989640
--Open source models struggle with self-correction and output quality:
>102993033 >102993349 >102994098 >102994218 >102994315 >102994381 >102994615 >102994913 >102995031 >102994946
--Meta publishes open source music model but deletes weights:
>102989563
--Gpt-soviets trained on moe-speech sounds better than Tomoko:
>102988359 >102988410 >102992236
--Advice on using Koboldcpp to test different AI models and quantizations:
>102988564 >102988717 >102988765 >102988805 >102988832 >102988754 >102988796 >102988844 >102988853 >102988849
--iGPU and APU performance for inferencing compared to discrete GPUs:
>102990238 >102990310
--Tips for generating smut stories:
>102989833 >102990006 >102990313 >102993566
--Newfag guide to AI models and terminology:
>102996361 >102996652 >102996683 >102996713 >102996779 >102997428 >102997550
--Mistral, GPT-SOVITT, and improved models anticipated:
>102997159 >102997173 >102997176 >102997185 >102997213 >102997203
--LLM ERP capabilities poll results and discussion:
>102993026 >102993613 >102993659 >102993982 >102993744 >102993800 >102994067 >102994113 >102994486
--INTELLECT-1 training progress update:
>102987982
--GLM-4-Voice sounds great in Chinese, but may be cherry-picked:
>102993857
--AI censorship discussion and risks of different modalities:
>102988230 >102988262 >102988351 >102988361 >102988387
--Miku (free space):
>102988359 >102989044 >102989186 >102991784 >102992608 >102996542 >102997941

►Recent Highlight Posts from the Previous Thread: >>102989254

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous
10/27/24(Sun)22:04:08 No.102998216

Anonymous 10/27/24(Sun)22:04:08 No.102998216

Where did the rentry go

Anonymous
10/27/24(Sun)22:04:18 No.102998217

Anonymous 10/27/24(Sun)22:04:18 No.102998217

>Flexora auto-selects which LLM layers to fine-tune, cutting training costs.
>Average accuracy improvement: +7.21% on Llama3-8B, +8.33% on ChatGLM3-6B, +1.98% on Mistral-7B-v0.1
https://arxiv.org/abs/2408.10774
https://x.com/rohanpaul_ai/status/1850673624384168224

Anonymous
10/27/24(Sun)22:09:33 No.102998257

Anonymous 10/27/24(Sun)22:09:33 No.102998257

>>102998216
>the real OP doesn't want to put it in
It's over...

Anonymous
10/27/24(Sun)22:10:34 No.102998265

Anonymous 10/27/24(Sun)22:10:34 No.102998265

>>102998257
I guess I'll just post the latest one here: http://rentry.org/pcrkt9pa

Anonymous
10/27/24(Sun)22:18:59 No.102998341

Anonymous 10/27/24(Sun)22:18:59 No.102998341

Sovits server API is really a piece of garbage. You couldn't even send the reference audio over the request, you need to have it already on the server and provide the path to it. Patched that and got the inference working on a server CPU at last, miss me with that gradio shit.

Anonymous
10/27/24(Sun)22:21:33 No.102998360

Anonymous 10/27/24(Sun)22:21:33 No.102998360

File: Untitled.png (1.23 MB, 1080x3167)

1.23 MB PNG

COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
https://arxiv.org/abs/2410.19313
>FP8 training has emerged as a promising method for improving training efficiency. Existing frameworks accelerate training by applying FP8 computation to linear layers while leaving optimizer states and activations in higher precision, which fails to fully optimize memory usage. This paper introduces COAT (Compressing Optimizer States and Activations for FP8 Training), a novel FP8 training framework designed to significantly reduce memory footprint when training large models. COAT addresses current limitations through two key innovations: (1) Dynamic Range Expansion, which aligns optimizer state distributions more closely with the FP8 representation range, thereby reducing quantization error, and (2) Mixed-Granularity Activation Quantization, which optimizes activation memory using a combination of per-tensor and per-group quantization strategies. Experiments demonstrate that COAT effectively reduces end-to-end training memory footprint by 1.54x compared to BF16 while achieving nearly lossless performance across various tasks, such as Large Language Model pretraining and fine-tuning and Vision Language Model training. COAT also achieves a 1.43x end-to-end training speedup compared to BF16, performing on par with or surpassing TransformerEngine's speedup. COAT enables efficient full-parameter training of large models on fewer GPUs, and facilitates doubling the batch size in distributed training settings, providing a practical solution for scaling large-scale model training.
https://github.com/NVlabs/COAT
Git isn't live yet. another step towards FP8 training runs becoming more common

Anonymous
10/27/24(Sun)22:22:10 No.102998364

Anonymous 10/27/24(Sun)22:22:10 No.102998364

kind of crazy to think about how ai is a solved science and with a couple more gens of nvidia chips and a few years of datacenter and power infra expanding we will be able to just use the current algorithms to create agi

Anonymous
10/27/24(Sun)22:25:22 No.102998394

Anonymous 10/27/24(Sun)22:25:22 No.102998394

File: Untitled.png (1.62 MB, 1327x1783)

1.62 MB PNG

DreamCraft3D++: Efficient Hierarchical 3D Generation with Multi-Plane Reconstruction Model
https://arxiv.org/abs/2410.12928
>We introduce DreamCraft3D++, an extension of DreamCraft3D that enables efficient high-quality generation of complex 3D assets. DreamCraft3D++ inherits the multi-stage generation process of DreamCraft3D, but replaces the time-consuming geometry sculpting optimization with a feed-forward multi-plane based reconstruction model, speeding up the process by 1000x. For texture refinement, we propose a training-free IP-Adapter module that is conditioned on the enhanced multi-view images to enhance texture and geometry consistency, providing a 4x faster alternative to DreamCraft3D's DreamBooth fine-tuning. Experiments on diverse datasets demonstrate DreamCraft3D++'s ability to generate creative 3D assets with intricate geometry and realistic 360° textures, outperforming state-of-the-art image-to-3D methods in quality and speed. The full implementation will be open-sourced to enable new possibilities in 3D content creation.
https://dreamcraft3dplus.github.io
https://github.com/MrTornado24/DreamCraft3D_Plus
big improvement over the original in how long it takes to generate a model (from 3 hours to 10 minutes)

Anonymous
10/27/24(Sun)22:28:05 No.102998415

Anonymous 10/27/24(Sun)22:28:05 No.102998415

>>102998360
I wonder why they're not exploring FP2 to FP12 models like Aphrodite engine did.

Anonymous
10/27/24(Sun)22:29:54 No.102998427

Anonymous 10/27/24(Sun)22:29:54 No.102998427

File: 1730082574454.jpg (68 KB, 552x556)

68 KB JPG

>>102998364

Anonymous
10/27/24(Sun)22:32:20 No.102998444

Anonymous 10/27/24(Sun)22:32:20 No.102998444

>>102998427
A discord chat / reddit meme.

Anonymous
10/27/24(Sun)22:32:21 No.102998445

Anonymous 10/27/24(Sun)22:32:21 No.102998445

>>102998171
Based.

Anonymous
10/27/24(Sun)22:36:40 No.102998472

Anonymous 10/27/24(Sun)22:36:40 No.102998472

File: 1730082946997.jpg (97 KB, 776x754)

97 KB JPG

>>102998444

Anonymous
10/27/24(Sun)22:37:42 No.102998483

Anonymous 10/27/24(Sun)22:37:42 No.102998483

File: 11.png (74 KB, 922x780)

74 KB PNG

INTELLECT-1 is at 29.50% complete, up from 27.60% last thread.

Anonymous
10/27/24(Sun)22:48:14 No.102998559

Anonymous 10/27/24(Sun)22:48:14 No.102998559

>>102998415
>exploring FP2 to FP12 models like Aphrodite engine did
>like Aphrodite engine did
This project really likes to copy other people's code and take credit for it.

Anonymous
10/27/24(Sun)22:49:11 No.102998564

Anonymous 10/27/24(Sun)22:49:11 No.102998564

>>102998559
Nah retard he really did it himself

Anonymous
10/27/24(Sun)22:52:43 No.102998596

Anonymous 10/27/24(Sun)22:52:43 No.102998596

>>102998564
hi Alpin

Anonymous
10/27/24(Sun)22:55:32 No.102998616

Anonymous 10/27/24(Sun)22:55:32 No.102998616

>>102998596
cope

Anonymous
10/27/24(Sun)22:55:55 No.102998621

Anonymous 10/27/24(Sun)22:55:55 No.102998621

>>102998616
hi cope

Anonymous
10/27/24(Sun)22:56:41 No.102998623

Anonymous 10/27/24(Sun)22:56:41 No.102998623

>>102998483
>perplexity plateauing at 6.75
grim

Anonymous
10/27/24(Sun)22:58:00 No.102998635

Anonymous 10/27/24(Sun)22:58:00 No.102998635

>>102998564
>this file is copied from
https://github.com/vllm-project/vllm/pull/8751/files#diff
Literally most files are taken from this Github and the DeepSpeed library:
https://github.com/usyd-fsalab/fp6_llm
>We propose TC-FPx, the first full-stack GPU system design scheme with unified Tensor Core support of float-point weights for various quantization bit-width (6-bit, 5-bit, 3-bit, etc.)
>FP6-LLM is already integrated in DeepSpeed
But suddenly Aphrodite takes all the credit. What a piece of shit.

Anonymous
10/27/24(Sun)23:00:55 No.102998653

Anonymous 10/27/24(Sun)23:00:55 No.102998653

>>102998623
Not that I think this is going to be any good but the perfect coombot shouldn't have best perplexity cause that would lead to zero variety + lots of slop.

Anonymous
10/27/24(Sun)23:04:30 No.102998674

Anonymous 10/27/24(Sun)23:04:30 No.102998674

File: 1723123458031762.png (57 KB, 992x184)

57 KB PNG

>we made it easy to support different quantizations
>suddenly Aphrodite takes credit of the entire research

Anonymous
10/27/24(Sun)23:07:43 No.102998703

Anonymous 10/27/24(Sun)23:07:43 No.102998703

>>102998635
>>102998674
What the point of that research when it's not used pratically?

Anonymous
10/27/24(Sun)23:07:59 No.102998705

Anonymous 10/27/24(Sun)23:07:59 No.102998705

>>102998216
>>102998257
>>102998265
>whole thread works together to make something for the benefit of the whole general
>discord OP doesn't go with it
Tale as old as time

Anonymous
10/27/24(Sun)23:08:56 No.102998716

Anonymous 10/27/24(Sun)23:08:56 No.102998716

Newfag here. Seems like most people running local are using it to generate images which I don't give a shit about.
Does anyone run one locally for code generation or analysis? Trained from stackoverflow or a codebase or something?

Anonymous
10/27/24(Sun)23:10:12 No.102998729

Anonymous 10/27/24(Sun)23:10:12 No.102998729

>>102998265
I saved it for myself. OP is a fag as always

Anonymous
10/27/24(Sun)23:11:20 No.102998739

Anonymous 10/27/24(Sun)23:11:20 No.102998739

File: miborb.png (684 KB, 1216x832)

684 KB PNG

>>102998716
>Does anyone run one locally for code generation or analysis?
Yes. Deepseek is great for code and logic type work if you have the resources to run it at a high quant. coder 2.5 q8 is my daily

Anonymous
10/27/24(Sun)23:12:46 No.102998747

Anonymous 10/27/24(Sun)23:12:46 No.102998747

>>102998705
But dare you bake a thread without miku pic.. be prepared for extreme shitstirring.

Anonymous
10/27/24(Sun)23:13:59 No.102998756

Anonymous 10/27/24(Sun)23:13:59 No.102998756

>>102998703
>not used pratically
It's already in the deepspeed library, and vLLM used it through the library for the FP6 support. Copying the files and adding the other combinations doesn't give you the right to take credit for everything, asshole.
Go fuck yourself, grifter.

Anonymous
10/27/24(Sun)23:22:46 No.102998809

Anonymous 10/27/24(Sun)23:22:46 No.102998809

>>102998756
you're just jealous

Anonymous
10/27/24(Sun)23:23:15 No.102998811

Anonymous 10/27/24(Sun)23:23:15 No.102998811

>>102998705
>Whole thread
No it is just you newfags. Can you make /local c.ai exodus general/ please and fuck off?

Anonymous
10/27/24(Sun)23:24:37 No.102998817

Anonymous 10/27/24(Sun)23:24:37 No.102998817

>>102998747
I wish mikufaggot prime would dox some newfags.

Anonymous
10/27/24(Sun)23:25:21 No.102998820

Anonymous 10/27/24(Sun)23:25:21 No.102998820

>>102998811
>making a list of recommended models together to improve the general le bad and newfag... because it just is ok??

Anonymous
10/27/24(Sun)23:28:32 No.102998834

Anonymous 10/27/24(Sun)23:28:32 No.102998834

>>102998811
This general is literally a offspring of aicg, stop denying the roots

Anonymous
10/27/24(Sun)23:28:52 No.102998838

Anonymous 10/27/24(Sun)23:28:52 No.102998838

>>102998820
It is because oldfags know that finetunes just make models retarded or at best change style slightly. There is no objective best style. And if you are an oldfag you just download new thing and check it out yourself very quickly. By new thing i mean base models or instructs of course.

Anonymous
10/27/24(Sun)23:30:13 No.102998845

Anonymous 10/27/24(Sun)23:30:13 No.102998845

>>102998838
>because oldfags know that finetunes just make models retarded or at best change style slightly.
retards you mean, or if by oldfag you mean someone who only ever used undi models

Anonymous
10/27/24(Sun)23:30:44 No.102998848

Anonymous 10/27/24(Sun)23:30:44 No.102998848

>>102998834
The roots of my country have nothing to do with me not wanting some filthy migrants.

Anonymous
10/27/24(Sun)23:31:19 No.102998850

Anonymous 10/27/24(Sun)23:31:19 No.102998850

>>102998838
Here, try this one side by side with regular qwen2.5. Then tell me if its dumber / not any different:
https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-72B-v0.0

Anonymous
10/27/24(Sun)23:31:45 No.102998855

Anonymous 10/27/24(Sun)23:31:45 No.102998855

>>102998845
Hi drummer. Buy an ad.

Anonymous
10/27/24(Sun)23:34:43 No.102998867

Anonymous 10/27/24(Sun)23:34:43 No.102998867

>>102998838
The concept was to make the model table bigger and separated into best usecases in the future, and more importantly to give the average /lmg/ browser just something good to just use instead of constantly switching models
While the rentry is still barebones, it went from crap month-old placeholders to 6 good model suggestions in just one thread
I really don't get the seethe and hate at the concept, both from half the thread and OP apparently

Anonymous
10/27/24(Sun)23:35:14 No.102998870

Anonymous 10/27/24(Sun)23:35:14 No.102998870

>>102998739
What do you use it for typically?
Can you tell it to do something like:
>give me a function that does x

Anonymous
10/27/24(Sun)23:36:08 No.102998874

Anonymous 10/27/24(Sun)23:36:08 No.102998874

>>102998867
All the more reason for you to make your own refugee camp.

Anonymous
10/27/24(Sun)23:38:29 No.102998885

Anonymous 10/27/24(Sun)23:38:29 No.102998885

>>102998874
>my seethe at a rentry has no footing so I'll just tell you to leave

Anonymous
10/27/24(Sun)23:44:51 No.102998927

Anonymous 10/27/24(Sun)23:44:51 No.102998927

File: meegoo.png (828 KB, 1216x832)

828 KB PNG

>>102998870
>give me a function that does x
You can use it for that. Also: you can ask for a full program, you can copy a program into context (if you have enough) and query about it or get it to make improvements or explain things, you can pair-program with it starting from ideation right to working product including build-chain/makefiles, you can copy the schema into context and get it to generate complex sql queries with a high degree of sophistication...for just a few ideas of what you could use it for.
It is very good at instruction following in my experience. Better than largestral and a bit worse that 405b.

Anonymous
10/27/24(Sun)23:52:43 No.102998971

Anonymous 10/27/24(Sun)23:52:43 No.102998971

>>102998927
>you can pair-program with it starting from ideation right to working product including build-chain/makefiles
Insane.
Ever use it to write tests? Or have it generate code from the tests you write?
Also does your setup you run it with a GPU? Or can you do CPU only? My proxmox server only has an iGPU so I would assume it's dogshit, but worth an ask

Anonymous
10/27/24(Sun)23:59:25 No.102998999

Anonymous 10/27/24(Sun)23:59:25 No.102998999

File: 1720288876525467.webm (48 KB, 671x600)

48 KB WEBM

I want to branch out a bit more for the culture benchmark. What memes do you want your LLM to know about? I've already included the iToddlers BTFO meme.

Anonymous
10/28/24(Mon)00:04:39 No.102999015

Anonymous 10/28/24(Mon)00:04:39 No.102999015

File: 1703670003475918.jpg (69 KB, 460x460)

69 KB JPG

>>102998999
Include ebussy

Anonymous
10/28/24(Mon)00:22:54 No.102999115

Anonymous 10/28/24(Mon)00:22:54 No.102999115

>>102998867
What part of buy a fucking ad is hard to understand?

Anonymous
10/28/24(Mon)00:28:50 No.102999149

Anonymous 10/28/24(Mon)00:28:50 No.102999149

>>102999135
dont feed the trolls

Anonymous
10/28/24(Mon)00:30:23 No.102999161

Anonymous 10/28/24(Mon)00:30:23 No.102999161

>>102999149
Upvoted!

Anonymous
10/28/24(Mon)00:31:53 No.102999168

Anonymous 10/28/24(Mon)00:31:53 No.102999168

>>102999135
You created the Rentry to make money with the idea of selling the spots in it.
The existence of the Rentry will also make the thread be astroturfed to hell and back, everyone will keep spamming their models to appear as "organic word of mouth of the thread" to be put in the Rentry. This is a tactic that has been used before by Sao.

Anonymous
10/28/24(Mon)00:33:42 No.102999179

Anonymous 10/28/24(Mon)00:33:42 No.102999179

File: 1703247567205996.jpg (401 KB, 981x1032)

401 KB JPG

>>>102999135
>You created the Rentry to make money with the idea of selling the spots in it.
>The existence of the Rentry will also make the thread be astroturfed to hell and back, everyone will keep spamming their models to appear as "organic word of mouth of the thread" to be put in the Rentry. This is a tactic that has been used before by Sao.

Anonymous
10/28/24(Mon)00:34:58 No.102999186

Anonymous 10/28/24(Mon)00:34:58 No.102999186

>>102999179
Go back to sharty, incel.

Anonymous
10/28/24(Mon)00:37:06 No.102999198

Anonymous 10/28/24(Mon)00:37:06 No.102999198

>>102999186
after you take your meds

Anonymous
10/28/24(Mon)01:13:51 No.102999419

Anonymous 10/28/24(Mon)01:13:51 No.102999419

been out of touch for a year
what's the best uncensored model around 7B?

Anonymous
10/28/24(Mon)01:14:32 No.102999425

Anonymous 10/28/24(Mon)01:14:32 No.102999425

I saw people using multiple graphics cards to get more of the GGUF model on VRAM. Adding another 8GB VRAM card didn't scale as expected for me. With one card, I offloaded 15 layers (~6.5GB) to VRAM. Using two cards with Koboldcpp, I offloaded rougly around 24 layers (4.5GB each), but beyond that, I get OOM. It's not going to scale at 100%, right?

Anonymous
10/28/24(Mon)01:15:42 No.102999434

Anonymous 10/28/24(Mon)01:15:42 No.102999434

>>102998171
anybody got xtts2 running on win10?

Anonymous
10/28/24(Mon)01:16:21 No.102999437

Anonymous 10/28/24(Mon)01:16:21 No.102999437

File: messagefix.png (182 KB, 821x1013)

182 KB PNG

Small update on proofing of concepts for RPG Maker MV based LLM front end. I figured out how to put the LLM response into an in-game message box now. (you also have to manually code in your own word wrap function because for some reason it can't do that shit automatically) Thankfully it uses unispace text so it's pretty easy to deal with. Although justifying the text so that it doesn't look like shit is a whole other animal
It's also really janky in that it only allows a maximum of 12 lines per script, so you basically have to minify everything.
But I figure the way to do it would be to just make a big library of bite sized functions for everything and then use event calls in the games own built in event system to call them up.

Anonymous
10/28/24(Mon)01:20:12 No.102999463

Anonymous 10/28/24(Mon)01:20:12 No.102999463

>>102999419
yeah

Anonymous
10/28/24(Mon)01:29:06 No.102999509

Anonymous 10/28/24(Mon)01:29:06 No.102999509

>>102998739
>>102998927
Are you using this? https://github.com/QwenLM/Qwen2.5-Coder
From what I can tell DeepSeek is cloud only? Can you run it locally?

Anonymous
10/28/24(Mon)01:37:16 No.102999580

Anonymous 10/28/24(Mon)01:37:16 No.102999580

>>102998971
>Also does your setup you run it with a GPU? Or can you do CPU only?
>My proxmox server only has an iGPU so I would assume it's dogshit, but worth an ask
I only use my GPU for context processing and infer on my cpu. You need a lot of memory and decently fast ddr5 to have a chance at using big models effectively.

Anonymous
10/28/24(Mon)01:48:58 No.102999675

Anonymous 10/28/24(Mon)01:48:58 No.102999675

>>102999509
>Are you using this? https://github.com/QwenLM/Qwen2.5-Coder
no, I've never had much luck with Qwen and haven't seen any convincing results from any of their recent models
>From what I can tell DeepSeek is cloud only? Can you run it locally?
Yes you can run it locally if you have the memory, and since its an MoE, the inference performance is relatively good for the results you get using cpu (3x the speed compared to largestral)

Anonymous
10/28/24(Mon)01:56:49 No.102999734

Anonymous 10/28/24(Mon)01:56:49 No.102999734

>>102999463
no

Anonymous
10/28/24(Mon)01:58:59 No.102999751

Anonymous 10/28/24(Mon)01:58:59 No.102999751

>>102999419
>>102999463
>>102999734
Why are you like this?

Hi all, Drummer here...
10/28/24(Mon)02:00:28 No.102999761

Hi all, Drummer here... 10/28/24(Mon)02:00:28 No.102999761

File: Screenshot 2024-10-28 135939.png (133 KB, 1443x354)

133 KB PNG

>>102998855
NTA but I did. Getting outbidded hard though.

Anonymous
10/28/24(Mon)02:05:58 No.102999793

Anonymous 10/28/24(Mon)02:05:58 No.102999793

>>102999751
stop samfagging

Anonymous
10/28/24(Mon)02:07:02 No.102999796

Anonymous 10/28/24(Mon)02:07:02 No.102999796

something went wrong with anthracite's datasets after v2
for all model bases, the v2 version is always smarter than the v3/v4 version

Anonymous
10/28/24(Mon)02:17:59 No.102999844

Anonymous 10/28/24(Mon)02:17:59 No.102999844

File: Screenshot_20241028_131702_X.jpg (457 KB, 1080x1143)

457 KB JPG

You guys aren't ready for what's coming

Anonymous
10/28/24(Mon)02:20:28 No.102999859

Anonymous 10/28/24(Mon)02:20:28 No.102999859

if they were genuinely close to AGI then people wouldn't be quitting, even if they were scared, because they'd want to be a part of it and maintain influence over it

so the quitting doomers are just engaging in the usual EA/rationalist speculative retardation, AGI isn't soon

Anonymous
10/28/24(Mon)02:27:52 No.102999890

Anonymous 10/28/24(Mon)02:27:52 No.102999890

>>102998705
>Discord
You can tell it's true just by looking at certain posters ITT.

Anonymous
10/28/24(Mon)02:36:41 No.102999947

Anonymous 10/28/24(Mon)02:36:41 No.102999947

File: MildlyIntimidatingClawMiku.png (809 KB, 832x1216)

809 KB PNG

Good night /lmg/

Anonymous
10/28/24(Mon)02:46:40 No.102999991

Anonymous 10/28/24(Mon)02:46:40 No.102999991

File: 1713713496205407.png (787 KB, 586x606)

787 KB PNG

Anonymous
10/28/24(Mon)02:53:08 No.103000026

Anonymous 10/28/24(Mon)02:53:08 No.103000026

File: 00106-3050314564.png (321 KB, 512x512)

321 KB PNG

RIP my catto. He literally just passed away.

Anonymous
10/28/24(Mon)02:54:27 No.103000042

Anonymous 10/28/24(Mon)02:54:27 No.103000042

>>103000026
condolences

Anonymous
10/28/24(Mon)02:57:48 No.103000070

Anonymous 10/28/24(Mon)02:57:48 No.103000070

>>103000042
It's too late at night/early in the AM to arrange for a cremation right now. Is it weird that I feel too paranoid and weird about putting him on ice before rigor fully sets in?

Anonymous
10/28/24(Mon)02:58:03 No.103000072

Anonymous 10/28/24(Mon)02:58:03 No.103000072

>>103000026
>toxoplasma gondii incubator died
Oh no, anyway.

Anonymous
10/28/24(Mon)03:00:31 No.103000085

Anonymous 10/28/24(Mon)03:00:31 No.103000085

File: Screenshot_20241024_210326.png (1.57 MB, 1648x1267)

1.57 MB PNG

>>102999437
Not sure what you are trying to achieve anon. Hate to break it to you but isnt that kinda useless?
Why would i run a rpg maker game where I then have inference from my typing something. Did you think that through properly?
Rpgmaker would be perfect if you can show a llm the tileset and make it create a map. Still lots of fuckery for the json of the map.
The new sonnet 3.5 can do it KINDA. First llm that i could show a picture of the tileset and say tell me the x/y where i should place it. make a diverse map.
So currently this stuff fails at the start. With no maps no events. Premade maps are boring.

Anonymous
10/28/24(Mon)03:02:31 No.103000100

Anonymous 10/28/24(Mon)03:02:31 No.103000100

>>103000085
>Why would you do something difficult when you can do something easy?
Because I have a dick.

Anonymous
10/28/24(Mon)03:05:06 No.103000116

Anonymous 10/28/24(Mon)03:05:06 No.103000116

>>103000100
I think you misunderstood me. Why would anybody use this?
The reason rpgmaker would be great is for creating a premade (coherent) story thats connected.
Why would you run inference through rpgmaker lol
This is like the guy who made npcs in skyrim talk through gpt. Its a gimmick and nothing more.

Anonymous
10/28/24(Mon)03:06:37 No.103000125

Anonymous 10/28/24(Mon)03:06:37 No.103000125

>>103000085
nta, but...wut?
Just so I'm clear: you're proposing that one should use an LLM as a way to do things classical algorithms already do better and berating anon for using LLMs for convos/text gen because that's "kinda useless"?

Anonymous
10/28/24(Mon)03:07:25 No.103000127

Anonymous 10/28/24(Mon)03:07:25 No.103000127

>>102999991
Neat! I'd love to have that for my VR waifu.

Anonymous
10/28/24(Mon)03:10:01 No.103000137

Anonymous 10/28/24(Mon)03:10:01 No.103000137

>>103000116
Because a few weeks ago I jokingly asked about the viability of creating an inferencing front end entirely in RPG Maker MV (back at the height of the SillyTavern drama). And now I'm going to make it a reality. Because I fucking can.
Not because it's a good idea.
Not because it's the best practice.
But to make people like you seethe about its very existence.

Anonymous
10/28/24(Mon)03:14:43 No.103000165

Anonymous 10/28/24(Mon)03:14:43 No.103000165

File: 1416143237120.jpg (23 KB, 356x325)

23 KB JPG

How are translation models? I'm partially interested in translating shitty Japanese h-games.

Accuracy is whatever so long as it can form coherent sentences and do it in a timely manner.

Anonymous
10/28/24(Mon)03:18:02 No.103000181

Anonymous 10/28/24(Mon)03:18:02 No.103000181

>>103000125
Why would I run inference through rpgmaker? That doesnt even make any sense.
How will the story be created for example? On the fly? Does NPC A know what NPC B said?
Is it just a premade map and then just let the llm go from there?
Just use sillytavern and RP. Why the need for rpgmaker is what I am asking.
What I am saying is it that rpgmaker would be perfect to give the llm a prompt and it will make maps, events, etc. Only 1 map with a bit of dialogue would be really cool. Create images with flux to spice shit up.
But currently it already fails at the start with the map unfortunately.
LLMs are terrible with this. Sonnet 3.5 got a huge improvement but still nowhere near enough.

>>103000137
If its something for yourself and you fuck around I don't care, but you post updates for other people.
Obviously I assumed you wanted people to use whatever you make.

>>103000165
You would need to look at f95. There exists some tool that already uses chatgpt for translation. Its pretty good but the slop smells. If you point to local you could translate I suppose.
Keeps important stuff in context already etc. I thought gemma27b is good at translating for a smaller model but its cucked unfortunately. Not sure how well it will take ero content. Probably write in a style that not hot at all.

Anonymous
10/28/24(Mon)03:18:28 No.103000185

Anonymous 10/28/24(Mon)03:18:28 No.103000185

>>103000165
> How are translation models?
That reminds me…did lcpp get plamo 100b support?

Anonymous
10/28/24(Mon)03:20:36 No.103000193

Anonymous 10/28/24(Mon)03:20:36 No.103000193

>>103000181
You are mega gay bro

Anonymous
10/28/24(Mon)03:21:22 No.103000198

Anonymous 10/28/24(Mon)03:21:22 No.103000198

>>103000193
So are you for replying anon.

Anonymous
10/28/24(Mon)03:21:31 No.103000200

Anonymous 10/28/24(Mon)03:21:31 No.103000200

>>102999425
pls respond. Why can I use more vram for a single card but less vram per card when I go multi card? The language I see everyone use is "I have 32gb vram (8gb+24gb)" as if it stacks up. Do I need to run llama.cpp in cli for that kind of efficiency?

Anonymous
10/28/24(Mon)03:22:24 No.103000205

Anonymous 10/28/24(Mon)03:22:24 No.103000205

>103000198
Such a great way to own yourself lfmao

Anonymous
10/28/24(Mon)03:22:56 No.103000207

Anonymous 10/28/24(Mon)03:22:56 No.103000207

are used 3090s still the best now?

Anonymous
10/28/24(Mon)03:28:32 No.103000236

Anonymous 10/28/24(Mon)03:28:32 No.103000236

>>103000198
>>103000193
I'm actually gay and give you both the seal of utter faggotry.

Anonymous
10/28/24(Mon)03:30:14 No.103000247

Anonymous 10/28/24(Mon)03:30:14 No.103000247

>>103000207
I don't know. I have 4 but I'd be weary about buying more.
They're still the least overpriced relative to what they offer but their price should have declined a lot more by now than it has. They're getting rather old now which means long term reliability concerns beyond ebays 30 day money back policy.

Anonymous
10/28/24(Mon)03:30:42 No.103000252

Anonymous 10/28/24(Mon)03:30:42 No.103000252

>>103000236
>I'm actually gay
No need to make this more cringe than it already is.

Anonymous
10/28/24(Mon)03:33:23 No.103000282

Anonymous 10/28/24(Mon)03:33:23 No.103000282

>>103000000

Anonymous
10/28/24(Mon)03:34:55 No.103000298

Anonymous 10/28/24(Mon)03:34:55 No.103000298

File: jabenches.png (182 KB, 1001x654)

182 KB PNG

>>103000165
Best is still the usual suspects which are the online models. Mistral and Nemotron are on top for local which shouldn't be surprising but what is surprising is that Gemma is surprisingly good for its size and punch above their size with the 9B and 27B models many times larger.

Anonymous
10/28/24(Mon)03:38:04 No.103000322

Anonymous 10/28/24(Mon)03:38:04 No.103000322

>>103000247
Until AMD becomes more competitive with ROCm, 3090s aren't losing more value anytime soon. I expect it to slowly atrophy until AMD or Intel has a solid performance GPU that has 24GB that is cheaper and runs AI faster than 3090s.

Anonymous
10/28/24(Mon)03:39:50 No.103000335

Anonymous 10/28/24(Mon)03:39:50 No.103000335

>>103000322
Isn't AMD desperately trying to become competitive in the AI training department.

Anonymous
10/28/24(Mon)03:40:34 No.103000342

Anonymous 10/28/24(Mon)03:40:34 No.103000342

svelk

Anonymous
10/28/24(Mon)03:43:45 No.103000359

Anonymous 10/28/24(Mon)03:43:45 No.103000359

>>103000335
on the enterprise level. None of them give a shit about hobbyist AI
Mi300X is a very popular option for doing full finetunes. Doesn't quite match an H100 in terms of compute but 192 gigs of VRAM is 192 gigs of VRAM. And full fine tuning is quicker than lora training.

Anonymous
10/28/24(Mon)03:46:00 No.103000375

Anonymous 10/28/24(Mon)03:46:00 No.103000375

>>103000247
>>103000322
There's currently no competition in that niche, and NVidia has already cut production of the 4090. Those expecting the release of the 5090 to lower prices on the 3090 are delusional.

Anonymous
10/28/24(Mon)03:57:06 No.103000431

Anonymous 10/28/24(Mon)03:57:06 No.103000431

File: 1730069754862574.png (455 KB, 512x768)

455 KB PNG

I can imagine that Huang cannot sleep at night, haunted by the memory of his greatest mistake: releasing the RTX 3090 with its excessive VRAM.

Anonymous
10/28/24(Mon)03:59:52 No.103000452

Anonymous 10/28/24(Mon)03:59:52 No.103000452

Than god there are no women in /lmg/ that the refugees could rape.

Anonymous
10/28/24(Mon)04:00:23 No.103000454

Anonymous 10/28/24(Mon)04:00:23 No.103000454

>>102998171

Anonymous
10/28/24(Mon)04:00:53 No.103000457

Anonymous 10/28/24(Mon)04:00:53 No.103000457

>>103000026
:(

Anonymous
10/28/24(Mon)04:05:32 No.103000484

Anonymous 10/28/24(Mon)04:05:32 No.103000484

>>102999761
Kek. What's the point of buying an ad for models you can download for free?

Anonymous
10/28/24(Mon)04:06:42 No.103000495

Anonymous 10/28/24(Mon)04:06:42 No.103000495

>>102998999
At least all the ancient memes from here. If they did their job properly it should have been included in the dataset

Anonymous
10/28/24(Mon)04:08:29 No.103000516

Anonymous 10/28/24(Mon)04:08:29 No.103000516

>>103000484
Because he has a crowd funding link on his HF profile.
But he bought an ad. So he's a legend.
Unlike the ones who inorganically shill their shit here all day long and don't buy an ad.

Anonymous
10/28/24(Mon)04:17:25 No.103000576

Anonymous 10/28/24(Mon)04:17:25 No.103000576

>>103000181
You can actually do better by including function calling which is supported by llama3.1/3.2 https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_1/#user-defined-custom-tool-calling it's supported in vllm (not in llama.cpp afaik) https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#tool-calling-in-the-chat-completion-api

Anonymous
10/28/24(Mon)04:38:37 No.103000731

Anonymous 10/28/24(Mon)04:38:37 No.103000731

Is there an extension or something that works with sillytavern's and SD forge that allows the local model's output to be parsed and turned into a pony-friendly output that uses booru tags and pastes them onto forgeUI?
I dont know how to make extensions and I have zero coding experience and just recently learned how to even write python shit. I wanted to use o1 to hold my hand through the entire process of making my own extension if nothing like that exists.

Anonymous
10/28/24(Mon)04:43:16 No.103000785

Anonymous 10/28/24(Mon)04:43:16 No.103000785

will anon buy 50series?

Anonymous
10/28/24(Mon)04:45:24 No.103000805

Anonymous 10/28/24(Mon)04:45:24 No.103000805

>>103000785
Sure, when 70s come out I'll buy a 5060.

Anonymous
10/28/24(Mon)04:59:18 No.103000903

Anonymous 10/28/24(Mon)04:59:18 No.103000903

File: IMG_2239.jpg (511 KB, 1284x1579)

511 KB JPG

I am at the cutting edge of AI baneposting.

llama.cpp CUDA dev !!OM2Fp6Fn93S
10/28/24(Mon)05:02:16 No.103000932

llama.cpp CUDA dev !!OM2Fp6Fn93S 10/28/24(Mon)05:02:16 No.103000932

>>102999425
>>103000200
Generally speaking it is better to have the VRAM on a single GPU vs. spread out over multiple GPUs but the increase in the number of layers that you can fit should be roughly linear with total VRAM.
If you use the exact same model and settings for both cases (context size and KV cache quantization most relevant) and there are no other applications consuming relevant amounts of VRAM that should not be happening.

Anonymous
10/28/24(Mon)05:06:48 No.103000968

Anonymous 10/28/24(Mon)05:06:48 No.103000968

>>103000785
The value of GPUs for AI is primarily determined by their VRAM-to-price ratio, along with reasonable performance and adequate support. I do not expect the 50 series to fare well in these metrics.

Anonymous
10/28/24(Mon)05:17:52 No.103001074

Anonymous 10/28/24(Mon)05:17:52 No.103001074

good morning /lmg/. are we still at the peak of newfag infestation wave?

Anonymous
10/28/24(Mon)05:18:22 No.103001080

Anonymous 10/28/24(Mon)05:18:22 No.103001080

File: 1728537650292301.png (855 KB, 590x698)

855 KB PNG

Anonymous
10/28/24(Mon)05:20:48 No.103001105

Anonymous 10/28/24(Mon)05:20:48 No.103001105

>>103001074
One(1) "backend confused" newfren made y'all shit your pants for 20+ hours, think about it.

Anonymous
10/28/24(Mon)05:21:12 No.103001107

Anonymous 10/28/24(Mon)05:21:12 No.103001107

File: file.png (980 KB, 768x768)

980 KB PNG

Anonymous
10/28/24(Mon)05:22:08 No.103001114

Anonymous 10/28/24(Mon)05:22:08 No.103001114

>>103000785
yes

Anonymous
10/28/24(Mon)05:24:54 No.103001131

Anonymous 10/28/24(Mon)05:24:54 No.103001131

>>103001105
you need to go back to c.ai. all of you

Anonymous
10/28/24(Mon)05:27:29 No.103001144

Anonymous 10/28/24(Mon)05:27:29 No.103001144

>>103000207
https://www.digitaltrends.com/computing/its-time-to-bid-farewell-to-nvidia-rtx-30/

Anonymous
10/28/24(Mon)05:27:50 No.103001148

Anonymous 10/28/24(Mon)05:27:50 No.103001148

>>103001131
"I talk to chatbots, but LOCALLY" is not a hobby worth gatekeeping.

Anonymous
10/28/24(Mon)05:30:03 No.103001164

Anonymous 10/28/24(Mon)05:30:03 No.103001164

>>103000903
Nice tinyllama you got here

Anonymous
10/28/24(Mon)05:31:10 No.103001173

Anonymous 10/28/24(Mon)05:31:10 No.103001173

>>103001148
yes it is
you're not wanted here
get the fuck out

Anonymous
10/28/24(Mon)05:32:18 No.103001182

Anonymous 10/28/24(Mon)05:32:18 No.103001182

>>103001148
Don't forget limited context, too.

Anonymous
10/28/24(Mon)05:33:48 No.103001193

Anonymous 10/28/24(Mon)05:33:48 No.103001193

>>103001107
Restraining the Pochiface in a closet until it dies from dehydration.

Anonymous
10/28/24(Mon)05:35:25 No.103001206

Anonymous 10/28/24(Mon)05:35:25 No.103001206

>>103001193
What the fuck are you talking about, schizo?

Anonymous
10/28/24(Mon)05:57:12 No.103001334

Anonymous 10/28/24(Mon)05:57:12 No.103001334

File: 11_00067_.png (1.49 MB, 832x1216)

1.49 MB PNG

>>102998171
Fall into something new

Anonymous
10/28/24(Mon)06:05:03 No.103001393

Anonymous 10/28/24(Mon)06:05:03 No.103001393

>>103000932
Thanks for acknowledging that vram usage scale near linear. It should be similar with either gguf or exl2, right?
With that, I can focus on my settings and have more confidence getting another card.

llama.cpp CUDA dev !!OM2Fp6Fn93S
10/28/24(Mon)06:07:41 No.103001408

llama.cpp CUDA dev !!OM2Fp6Fn93S 10/28/24(Mon)06:07:41 No.103001408

>>103001393
>It should be similar with either gguf or exl2, right?
I'm not particularly knowledgeable when it comes to the internal workings of ExLlama but I don't see why it would be different.

One thing that could as of right now make a difference with llama.cpp and derivatives: If you set --split-mode row the scaling will not be linear because as of right now the KV cache is only on a single GPU.

Anonymous
10/28/24(Mon)06:32:25 No.103001577

Anonymous 10/28/24(Mon)06:32:25 No.103001577

>>103001334
Fall into a puddle of Leaku pee

Anonymous
10/28/24(Mon)07:03:37 No.103001749

Anonymous 10/28/24(Mon)07:03:37 No.103001749

>>103001699
>dystopian robot reference
Based

Anonymous
10/28/24(Mon)07:07:18 No.103001774

Anonymous 10/28/24(Mon)07:07:18 No.103001774

>>103001699
>not a card
aw

Anonymous
10/28/24(Mon)07:14:09 No.103001809

Anonymous 10/28/24(Mon)07:14:09 No.103001809

How CPU affects on performance (in general) in GPU inference and GPU training?
I only found that link for inference where saying that it have almost zero effect:
https://www.pugetsystems.com/labs/articles/effects-of-cpu-speed-on-gpu-inference-in-llama-cpp/
And for training I see that if CPU have doesn't enough cores (at least 4 core per 1 GPU) and memory channels it would bottleneck training.
How would affect GPU training for example difference between 2 x AMD EPYC 7763 64-Core and 2 x AMD EPYC 9965 192-Core?
In the raw benchmark it is more than twice faster:
https://openbenchmarking.org
Does anon knows any other benchmarks/links?

Anonymous
10/28/24(Mon)07:36:45 No.103001981

Anonymous 10/28/24(Mon)07:36:45 No.103001981

>>103001952
Thanks anon, I really appreciate.

Anonymous
10/28/24(Mon)07:44:03 No.103002035

Anonymous 10/28/24(Mon)07:44:03 No.103002035

File: file.png (2.08 MB, 1404x1200)

2.08 MB PNG

What's the current best poorfag CPU model?
I want something to talk to while waiting for a bitnet model to come out.

Anonymous
10/28/24(Mon)07:46:42 No.103002068

Anonymous 10/28/24(Mon)07:46:42 No.103002068

>>103002035
Mistral Nemo 12b or one of its finetunes.
If you want really fast, olmoe, but it has low context.
But i'd still buy even a 16gb gpu if i were you... just in case...

Anonymous
10/28/24(Mon)07:51:05 No.103002097

Anonymous 10/28/24(Mon)07:51:05 No.103002097

>>103002035
That thing doesn't even look like Miku, wtf are they smoking to pretend it's her?

Anonymous
10/28/24(Mon)07:55:25 No.103002127

Anonymous 10/28/24(Mon)07:55:25 No.103002127

File: 1724568862266900.gif (606 KB, 220x149)

606 KB GIF

>>103002097

Anonymous
10/28/24(Mon)07:57:12 No.103002143

Anonymous 10/28/24(Mon)07:57:12 No.103002143

>>103002035
pyg6b

Anonymous
10/28/24(Mon)07:57:13 No.103002144

Anonymous 10/28/24(Mon)07:57:13 No.103002144

File: 1712608501726210.jpg (135 KB, 612x611)

135 KB JPG

>>103002127
Okay zoomer

Anonymous
10/28/24(Mon)08:06:40 No.103002205

Anonymous 10/28/24(Mon)08:06:40 No.103002205

>>102999437
That's a really cool experiment anon. See how you can take the concept.

>It's also really janky in that it only allows a maximum of 12 lines per script, so you basically have to minify everything.
What the fuck.
Can you at least break things into multiple script files?

Anonymous
10/28/24(Mon)08:08:55 No.103002217

Anonymous 10/28/24(Mon)08:08:55 No.103002217

File: __vocaloid_and_2_more_dra(...).jpg (473 KB, 1000x750)

473 KB JPG

>>103002097
>That thing doesn't even look like Miku, wtf are they smoking to pretend it's her?

Anonymous
10/28/24(Mon)08:33:09 No.103002364

Anonymous 10/28/24(Mon)08:33:09 No.103002364

https://github.com/ggerganov/llama.cpp/pull/9702
>added implementation of DRY sampler (post-refactor) #9702
Was merged in on Friday.

Anonymous
10/28/24(Mon)08:47:54 No.103002454

Anonymous 10/28/24(Mon)08:47:54 No.103002454

>>103002205
yeah, you can chop everything up into function definitions and function calls and attach them to common events as long as they are attached to the window object, otherwise it doesn't treat the scene as a contiguous environment.

Anonymous
10/28/24(Mon)09:03:47 No.103002573

Anonymous 10/28/24(Mon)09:03:47 No.103002573

File: Untitled.png (547 KB, 603x490)

547 KB PNG

>>103000181
not that anon or any other involved in this conversation or project yet, but i am excited about anon's research.
it would be cool as heck to be able to have a conversation with my LLM by walking around and interacting with an SNES looking map.
sure, an RP experience would better just using a text to text interface. but think about it the other way around, this could embetter rpgmaker experiences.

Anonymous
10/28/24(Mon)09:06:20 No.103002587

Anonymous 10/28/24(Mon)09:06:20 No.103002587

I asked in a prior thread about llama-server's behavior when receiving a prompt larger than the configured context size : >>102991521
>Back in the day llama.cpp server would crash if you tried to stuff a prompt larger than the configured context sized into it, it no longer does that.
>Is it safe to assume that it's simply cropping the context at the top?
>Is there a reason one would want to do that instead of just setting the correct prompt size in the frontend software?

Anonymous
10/28/24(Mon)09:12:00 No.103002629

Anonymous 10/28/24(Mon)09:12:00 No.103002629

File: duck.png (335 KB, 1079x1263)

335 KB PNG

Can someone explain to a vramlet why everyone is offloading the kv cache in Kcpp? Is there some downside to keeping it in RAM?

Anonymous
10/28/24(Mon)09:15:46 No.103002654

Anonymous 10/28/24(Mon)09:15:46 No.103002654

>>103002629
it's slower

Anonymous
10/28/24(Mon)09:17:07 No.103002664

Anonymous 10/28/24(Mon)09:17:07 No.103002664

>>103002629
Yes.
Any context processing becomes dog fucking slow.
With flash attention and quanted cache, putting the kv cache on RAM to fit one, maybe two more of the models layer's on VRAM is a bad tradeoff.

Anonymous
10/28/24(Mon)09:17:58 No.103002674

Anonymous 10/28/24(Mon)09:17:58 No.103002674

>>103002654
Both prompt processing and inference, or just the prompt processing? Cause I seem to get better inference speeds with offloading more layers.

Anonymous
10/28/24(Mon)09:31:26 No.103002809

Anonymous 10/28/24(Mon)09:31:26 No.103002809

>>103001809
https://rentry.org/miqumaxx

Anonymous
10/28/24(Mon)09:32:35 No.103002817

Anonymous 10/28/24(Mon)09:32:35 No.103002817

Why are you guys running locally when you can get free Llama 3 405B API key straight from Openrouter?

Genuine question.

Anonymous
10/28/24(Mon)09:32:36 No.103002818

Anonymous 10/28/24(Mon)09:32:36 No.103002818

>>103002587
>Is it safe to assume that it's simply cropping the context at the top?
Even if it parses the whole prompt, whatever falls out of the context is just gone. Whatever method it uses (whether it's directly cropping or shifting the context) i don't think it makes much of a difference.
>Is there a reason one would want to do that instead of just setting the correct prompt size in the frontend software?
Do what exactly? The frontend doesn't necessarily know how long (in tokens) your input is nor how much of the context is used without the backend telling it. As far as i know they just send the text raw-ish (adding chat format tokens and all the extra things ST does, for example) and just let the backend tokenize it and process it. So even if you set the context length in the frontend, i don't think they can do much with that information. Maybe they guesstimate how many tokens there are, but it's never going to be precise. Unless they actually tokenize the user input, but still. I don't think they do.

Anonymous
10/28/24(Mon)09:35:33 No.103002847

Anonymous 10/28/24(Mon)09:35:33 No.103002847

>>103002817
To remove the dependency on someone else's stuff.
And i'll just wait for the next fuck to ask again next thread.

Anonymous
10/28/24(Mon)09:35:41 No.103002849

Anonymous 10/28/24(Mon)09:35:41 No.103002849

>>103002364
based. i'm going back to local again.

Anonymous
10/28/24(Mon)09:35:49 No.103002851

Anonymous 10/28/24(Mon)09:35:49 No.103002851

>>103002817
Nothing is free.

Anonymous
10/28/24(Mon)09:38:18 No.103002875

Anonymous 10/28/24(Mon)09:38:18 No.103002875

>>103002818
>Maybe they guesstimate how many tokens there are,
At least as far as ST is concerned, it calls an API to tokenize the text to know how many messages it can fit.
Meaning that ST actually knows how many tokens are in the prompt it's sending no matter the model the backend is serving, as long as the server has a tokenization API, that is. Otherwise it just guestimates using a default tokenizer on the frontend itself.
See
>https://docs.sillytavern.app/usage/core-concepts/advancedformatting/#tokenizer

Anonymous
10/28/24(Mon)09:39:03 No.103002886

Anonymous 10/28/24(Mon)09:39:03 No.103002886

>>103002847
Yeah but you can just make your stack or do whatever you need and replace it with your own locally ran shit at a later date instead. It's not like it's a proprietary model you're getting dependent on.

>>103002851
It's free for now and not proprietary which should be good enough, no?

https://openrouter.ai/nousresearch/hermes-3-llama-3.1-405b:free/api
https://openrouter.ai/meta-llama/llama-3.1-405b-instruct:free

Anonymous
10/28/24(Mon)09:40:41 No.103002903

Anonymous 10/28/24(Mon)09:40:41 No.103002903

>>103002817
>>103002886
Interesting. What's the catch here? Aside from the data collection of course.

Anonymous
10/28/24(Mon)09:41:54 No.103002916

Anonymous 10/28/24(Mon)09:41:54 No.103002916

>>103002817
>Why are you guys running locally when you can get free Llama 3 405B API key straight from Openrouter?
Because I can run 405b locally.
I'll let someone else explain why giving others your logs for free in exchange for access is actually a bad deal.

Anonymous
10/28/24(Mon)09:43:13 No.103002925

Anonymous 10/28/24(Mon)09:43:13 No.103002925

>>103002886
>Yeah but you can just make your stack or do whatever you need and replace it with your own locally ran shit at a later date instead. It's not like it's a proprietary model you're getting dependent on.
I like running models on my own hardware, without the need for internet or depending on others. Even if it keeps working forever, i'd still depend on them. If it worked just *once* on my own setup, it will always work.

Anonymous
10/28/24(Mon)09:43:55 No.103002934

Anonymous 10/28/24(Mon)09:43:55 No.103002934

>>103002903
They actually say they don't collect your data on the free API. The "catch" is to make developers make apps around openrouter APIs so that businesses in 3-5 years time are locked into their ecosystems.

They don't gather data precisely because they want businesses to use their service, businesses would immediately forgo them if they collected their data and thus trade secrets.

Which is precisely why I wonder why people in /lmg/ use their own personal hosted stuff instead of free non-proprietary stuff that doesn't gather your data at all and will be guaranteed available for free until 2026.

Anonymous
10/28/24(Mon)09:45:31 No.103002952

Anonymous 10/28/24(Mon)09:45:31 No.103002952

>>103002629
>>103002654
>>103002664
>>103002674
So after some testing, offloading seems better with longer prompts. The effect on processing speed seems minuscule? This is with Radeon 7800xt 16GB, so maybe it's just some ROCM thing.

>"lowvram + fa" 40/57 layers

10k context
ProcessingSpeed: 135.65T/s
GenerationSpeed: 3.94T/s

zero context
ProcessingSpeed: 144.31T/s
GenerationSpeed: 6.60T/s

>"fa" 46/57 layers

10k context
ProcessingSpeed: 133.35T/s
GenerationSpeed: 3.31T/s

zero context
ProcessingSpeed: 161.32T/s
GenerationSpeed: 7.51T/s

Anonymous
10/28/24(Mon)09:45:53 No.103002953

Anonymous 10/28/24(Mon)09:45:53 No.103002953

What's the best local model for programming?
chatGPT sucks, Claude has pathetic rate limits

Anonymous
10/28/24(Mon)09:46:26 No.103002958

Anonymous 10/28/24(Mon)09:46:26 No.103002958

>>103002916
>giving others your logs
They explicitly state they don't keep logs because they want to attract enterprise businesses and developers with their free API access.
>Because I can run 405b locally
Unless you also have free electricity it still makes more sense to use their free API, no?

I'm legitimately trying to figure out why people here would prefer locally run over a free API that doesn't gather data.

Anonymous
10/28/24(Mon)09:46:53 No.103002964

Anonymous 10/28/24(Mon)09:46:53 No.103002964

>>103002952
e: lowvram is the second one, sorry

Anonymous
10/28/24(Mon)09:48:44 No.103002987

Anonymous 10/28/24(Mon)09:48:44 No.103002987

>>103002934
>They actually say they don't collect your data on the free API
cool story

Anonymous
10/28/24(Mon)09:50:21 No.103003005

Anonymous 10/28/24(Mon)09:50:21 No.103003005

>>103002953
Never tried one for programming, but i've seen anons recommending deepseek and qwen2.5. Depends on your hardware i suppose.

Anonymous
10/28/24(Mon)09:50:32 No.103003007

Anonymous 10/28/24(Mon)09:50:32 No.103003007

>>103002934
Well, I for one had no idea.
I'll take a look at it and see if it's worth the switch.
Having my shit locally and offline is really nice, regardless.
I'm a tinkerer, so there's a certain pleasure in having to fiddle with models and launch parameters and such.
Also, fine tuning my own small models.
But for local productivity, I guess there's not much reason not to use such large model for free.

>>103002952
>>103002964
If you benchmarked it and it works better for you, than awesome.

Anonymous
10/28/24(Mon)09:51:26 No.103003019

Anonymous 10/28/24(Mon)09:51:26 No.103003019

>>103002958
>free API that **CLAIMS** doesn't gather data.

Anonymous
10/28/24(Mon)09:51:32 No.103003021

Anonymous 10/28/24(Mon)09:51:32 No.103003021

>>103002953
I was using deepseek through their website and it's really, really nice.

Anonymous
10/28/24(Mon)09:51:34 No.103003023

Anonymous 10/28/24(Mon)09:51:34 No.103003023

>>103002987
They are trying to compete with OpenAI, Anthropic and Google on the enterprise API front. All of those other providers state to not gather your data or use your data through the API portal. It's in their best interest to not do so as it would scare away the very enterprise demographic they want to attract with this.

It would also just be a lawsuit waiting to happen, bankrupting them in one go if just a single developer/company using their API finds out about it.

Anonymous
10/28/24(Mon)09:52:35 No.103003035

Anonymous 10/28/24(Mon)09:52:35 No.103003035

>>103003019
It's in their best interest to not do so, they don't have an incentive to lie as it would scare away the companies and developers they try to attract with this.

Anonymous
10/28/24(Mon)09:54:28 No.103003059

Anonymous 10/28/24(Mon)09:54:28 No.103003059

>>103003035
>you saved the data. i'll sue you
>we dont. but if you can show we save data, you'll have to show it in court
>nevermind, then. how much for 100m extra tokens?

Anonymous
10/28/24(Mon)09:54:36 No.103003061

Anonymous 10/28/24(Mon)09:54:36 No.103003061

>>103002934
>y tho?
I also own my own domain, run my own dns and mail, host my own websites and private "cloud" apps and manage my own offsite backups.
This is /lmg/. We don't need a reason, but if we did you might want to check the news for the various rugpulls starting to happen (speedrunning to the point of normie suicide already). Those won't slow down once people become dependent on proprietary cloud stuff.
Self-determination pays for itself often enough that it shouldn't be dismissed out of hand.

Anonymous
10/28/24(Mon)09:59:01 No.103003110

Anonymous 10/28/24(Mon)09:59:01 No.103003110

>>103003035
That means they have an incentive to not get caught.
Truth is you can't really know until it's tested, just like with all those "zero log vpns" able to help identify people when the 3 letter agencies come knocking.
I'll acknowledge the other side too, that, just how like there are true zero logs vpns out there, openrouter could be telling the truth, but we can't really know either way until it's tested somehow.

Anonymous
10/28/24(Mon)10:00:08 No.103003124

Anonymous 10/28/24(Mon)10:00:08 No.103003124

>>103000298
>Big 3 are all around the same level and peaked with gpt 4 turbo
It is so over for llms

Anonymous
10/28/24(Mon)10:00:40 No.103003133

Anonymous 10/28/24(Mon)10:00:40 No.103003133

>>103003061
>I also own my own domain, run my own dns and mail, host my own websites and private "cloud" apps and manage my own offsite backups.
I do all of those things as well. I'm on /lmg/ posting here for a reason.

The question here essentially is why not use a free API of a non-proprietary model that doesn't gather logs or data for as long as possible and if it ever comes down it's just a one line replacement with your own self-hosted API link.

I understand not wanting to depend on proprietary models, or not wanting your access to get logged, but I don't understand why not use a free API that doesn't log anything and is of a proprietary model you could hypothetically run yourself whenever needed.

Anonymous
10/28/24(Mon)10:01:19 No.103003138

Anonymous 10/28/24(Mon)10:01:19 No.103003138

is used a770 16gb any good?

Anonymous
10/28/24(Mon)10:03:28 No.103003161

Anonymous 10/28/24(Mon)10:03:28 No.103003161

>>103003110
Are you saying that some people are willing to lie on the internet? Preposterous...

>>103003133
>that doesn't gather logs or data for as long as possible
>that doesn't log anything
Do you really trust them that much?

Anonymous
10/28/24(Mon)10:03:30 No.103003162

Anonymous 10/28/24(Mon)10:03:30 No.103003162

>>103003110
Those zero log VPNs target essentially consumers. Openrouter is trying to target developers and businesses with these API keys, completely different demographics and different incentive structures. Openrouter thinks they can get money by vendor lock-in over the years and using the free API as a trojan horse. Logging data would be equivalent to shooting themselves in the foot for just a tiny sliver of money compared to what they could have had by speaking the truth and roping in a bunch of businesses that get hooked to their system over time.

It just makes no financial sense for them to secretly log. A lot of risk for barely any reward.

Anonymous
10/28/24(Mon)10:04:14 No.103003168

Anonymous 10/28/24(Mon)10:04:14 No.103003168

>>103003133
One reason I can think of is to not get "used" to the smarts of a 405B then possibly have to settle for less when it inevitably stops being free and you can't run that at home.

Anonymous
10/28/24(Mon)10:04:57 No.103003178

Anonymous 10/28/24(Mon)10:04:57 No.103003178

>>103002934
>https://openrouter.ai/docs/provider-routing
>Data Privacy
>Some model providers may log prompts, so we display them with a Data Policy tag on model pages. This is not a definitive source of third party data policies, but represents our best knowledge.
What do you think openROUTER means? They aren't guaranteeing random providers aren't logging (and I'd bet openrouter themselves are, too..."We use the information we collect for various purposes, including: To improve our services" from their privacy page)

Anonymous
10/28/24(Mon)10:06:59 No.103003200

Anonymous 10/28/24(Mon)10:06:59 No.103003200

https://arstechnica.com/tech-policy/2021/08/zoom-to-pay-85m-for-lying-about-encryption-and-sending-data-to-facebook-and-google/

Anonymous
10/28/24(Mon)10:07:33 No.103003207

Anonymous 10/28/24(Mon)10:07:33 No.103003207

The answer is CP. People want to RP with CP and they can only do so without paranoia on their self-hosted model. It's that simple, no need to get philosophical about things.

Anonymous
10/28/24(Mon)10:08:13 No.103003215

Anonymous 10/28/24(Mon)10:08:13 No.103003215

>>103002903
>>103002934
>>103002958
Not running locally atm, but valid reasons that I see are:
1. I don't believe they're not gathering data.
2. It's rate-limited.
Still pretty nice. I don't even care that they gather my data and read my coom logs. So I'm gonna use it now.

Anonymous
10/28/24(Mon)10:08:32 No.103003220

Anonymous 10/28/24(Mon)10:08:32 No.103003220

File: Komattamiku.png (1.52 MB, 832x1216)

1.52 MB PNG

Good morning, /lmg/

Anonymous
10/28/24(Mon)10:08:53 No.103003227

Anonymous 10/28/24(Mon)10:08:53 No.103003227

>>103003162
That's your interpretation and assumptions.
As I said, we can't know until it's experimentally proven.
That's all.
I'm not telling you not to use it, just arguing that a policy by itself is not a guarantee, even legally binding contracts are broken when if it's more profitable.

>>103003161
Socking, I know.

>>103003215
This. I'm going to give it a go too.

Anonymous
10/28/24(Mon)10:10:45 No.103003248

Anonymous 10/28/24(Mon)10:10:45 No.103003248

>>103003207
I do it on my paid Claude account.

Anonymous
10/28/24(Mon)10:10:47 No.103003250

Anonymous 10/28/24(Mon)10:10:47 No.103003250

>>103003168
Openrouter promised to keep it free until at least the end of 2026. So you have 2 years to get the hardware to run something equivalent. By 2027 I bet you you can run a 70B model with equivalent smarts. Probably better inference hardware out then for cheaper as well.

Anonymous
10/28/24(Mon)10:12:59 No.103003271

Anonymous 10/28/24(Mon)10:12:59 No.103003271

So I bought those RTX 3090s for nothing?!....

Anonymous
10/28/24(Mon)10:15:24 No.103003300

Anonymous 10/28/24(Mon)10:15:24 No.103003300

>>103003271
I'll buy them for 200$ a piece.

Anonymous
10/28/24(Mon)10:16:19 No.103003310

Anonymous 10/28/24(Mon)10:16:19 No.103003310

>>103003250
>Openrouter promised to keep it free until at least the end of 2026.
Where? And how can they promise that when they're not the ones running the models?

Anonymous
10/28/24(Mon)10:17:54 No.103003321

Anonymous 10/28/24(Mon)10:17:54 No.103003321

>>103003271
I'll take them off your hands for $220 each

Anonymous
10/28/24(Mon)10:19:31 No.103003337

Anonymous 10/28/24(Mon)10:19:31 No.103003337

File: 511 - SoyBooru.png (17 KB, 600x800)

17 KB PNG

>Yes please. Despite its reasonable-sounding theoretical foundations, TFS has not stood the test of time and the collective experience of tens of thousands of power users has shown that it is soundly beaten by Min-P in practice, at a fraction of its complexity.

>FWIW, I believe that Typical and Mirostat should meet the same fate. Judging from presets posted in forums and model cards, they're very rarely used nowadays, and Mirostat in particular is difficult to build an intuition for. The lesson appears to be that samplers using entropy/perplexity as a control metric are overlooking one or multiple key properties that strongly determine output quality. A combination of Min-P and XTC can do anything those samplers can do and more, and do it in a way that is comprehensible to a human operator thinking about probability distributions.

>Heck, even Top-P and Top-K (for values > 1) should probably be deprecated and show a warning when used. Min-P fixes the conceptual flaws of both of them and is flat out superior. AFAIK, those samplers originate from the early OpenAI days and were then incorporated into HF Transformers, after which everyone else cargo culted them into every frontend and backend. Their continued availability (and the fact that they are presented at the very top of most frontends' sampler settings) really hurts the experience of new users, who probably shouldn't be using any sampler other than Temperature, Min-P, XTC, and DRY.

Anonymous
10/28/24(Mon)10:19:59 No.103003342

Anonymous 10/28/24(Mon)10:19:59 No.103003342

>>103003271
250.

Anonymous
10/28/24(Mon)10:20:14 No.103003348

Anonymous 10/28/24(Mon)10:20:14 No.103003348

>>103002886
>limited to 20 requests per minute and 200 requests per day

Anonymous
10/28/24(Mon)10:20:28 No.103003351

Anonymous 10/28/24(Mon)10:20:28 No.103003351

>>103003220
good morning miku

Anonymous
10/28/24(Mon)10:21:29 No.103003360

Anonymous 10/28/24(Mon)10:21:29 No.103003360

>>103003348
Nothing wrong with that. Everything should be enjoyed in moderation.

Anonymous
10/28/24(Mon)10:21:51 No.103003366

Anonymous 10/28/24(Mon)10:21:51 No.103003366

>>103003337
I was just reading it. I agree for of them except top-k. Stop posting self-portraits.

Anonymous
10/28/24(Mon)10:23:10 No.103003382

Anonymous 10/28/24(Mon)10:23:10 No.103003382

>no string ban/antislop
Yeah that's a shill.

Anonymous
10/28/24(Mon)10:23:51 No.103003392

Anonymous 10/28/24(Mon)10:23:51 No.103003392

>>103003360
Kys shill

Anonymous
10/28/24(Mon)10:24:56 No.103003404

Anonymous 10/28/24(Mon)10:24:56 No.103003404

>>103002886
>https://openrouter.ai/nousresearch/hermes-3-llama-3.1-405b:free/api
>8,192 context
https://openrouter.ai/meta-llama/llama-3.1-405b-instruct:free
>8,000 context
Also you're being weirdly pushy about moving people from their local setups to an API

Anonymous
10/28/24(Mon)10:25:25 No.103003410

Anonymous 10/28/24(Mon)10:25:25 No.103003410

>>103003348
>You can just make more accounts and request more API links as needed

Anonymous
10/28/24(Mon)10:31:30 No.103003466

Anonymous 10/28/24(Mon)10:31:30 No.103003466

>>103002934
>They actually say they don't collect your data on the free API
Where do they say that? I can't find any page on their site that actually states the policy

Anonymous
10/28/24(Mon)10:41:38 No.103003549

Anonymous 10/28/24(Mon)10:41:38 No.103003549

File: 654095f9356802a56be82064.png (390 KB, 700x525)

390 KB PNG

Buy the 50 series goy, you'll get to run 16GB VRAM at twice the speed. They even use 600W, the higher this number is the better btw, because you know you're putting your GPU to WORK. Unused electricity is wasted electricity. Accelerate!

Anonymous
10/28/24(Mon)10:44:21 No.103003576

Anonymous 10/28/24(Mon)10:44:21 No.103003576

>>103003337
>p-e-w
Isn't he the guy who made Min-P, XTC, and DRY? And now he wants to remove all the samplers except is? How totally unbiased.

Anonymous
10/28/24(Mon)10:48:00 No.103003610

Anonymous 10/28/24(Mon)10:48:00 No.103003610

>>103003337
TFS still has its uses as a less aggressive MinP. Just because redditards couldn't figure out how to use it doesn't mean that it's bad. Typical and Mirostat can go.

Anonymous
10/28/24(Mon)10:49:47 No.103003627

Anonymous 10/28/24(Mon)10:49:47 No.103003627

>>103003610
Don't worry, p-e-w will chose for you, he knows best, and ggerganov agrees

>@p-e-w Thanks for the insights. I will use this opportunity to ask you about what do you think is the best sampling strategy when the LLM is used for fill-in-the-middle (FIM) tasks (e.g. code completion). Do you have any experience and recommendations? My intuition is that one would generally want greedy sampling in this case (since we are not looking for variety and/or creativity), but with some conservative way to terminate early as soon as the LLM is not "confident" about the next token. Do you agree and if yes, what would be a good approach to model this during sampling?

Anonymous
10/28/24(Mon)10:51:07 No.103003645

Anonymous 10/28/24(Mon)10:51:07 No.103003645

>>103003576
Not min-p, but i think he did the other ones. As far as i remember min-p was kalomaze. Mirostat and Typical are shit, though. They're not intuitive.
Also, the new k-shift in the other PR is the lazy version of what he mentions in the paper. Absolutely useless without the rest of the paper. It defeats the purpose of implementing it at all.
>https://github.com/ggerganov/llama.cpp/pull/10048
>https://arxiv.org/pdf/2402.10200

Anonymous
10/28/24(Mon)10:51:08 No.103003646

Anonymous 10/28/24(Mon)10:51:08 No.103003646

Has anyone here figured out how to use batching effectively locally to maximize throughput in some useful way?

Anonymous
10/28/24(Mon)10:53:33 No.103003666

Anonymous 10/28/24(Mon)10:53:33 No.103003666

>>103003645
>https://github.com/ggerganov/llama.cpp/pull/10048

>I am currently sick and will be off the computer for a few days, but I intend to do a full review of this interesting PR soon.

>but please remove all samplers but mine kthx

Anonymous
10/28/24(Mon)10:56:47 No.103003701

Anonymous 10/28/24(Mon)10:56:47 No.103003701

>>103003666
Do you really have nothing to say?

Anonymous
10/28/24(Mon)10:58:55 No.103003729

Anonymous 10/28/24(Mon)10:58:55 No.103003729

>>103003701
Hope you get well soon so you can resume gutting the project?

Anonymous
10/28/24(Mon)10:59:12 No.103003732

Anonymous 10/28/24(Mon)10:59:12 No.103003732

>>103003627
>Don't worry, p-e-w will chose for you, he knows best, and ggerganov agrees
God bless koboldcpp, they'll probably keep the old samplers like they kept support for old quants in the early days.

Anonymous
10/28/24(Mon)11:00:14 No.103003745

Anonymous 10/28/24(Mon)11:00:14 No.103003745

>>103003646
>locally
What's the difference with "remotely"? Does batching mean a different thing?
Batching is more useful for offline stuff (as in non-interactive). Batch 1000 completion requests, check them later. Like making synth datasets and that sort of stuff.

Anonymous
10/28/24(Mon)11:01:21 No.103003763

Anonymous 10/28/24(Mon)11:01:21 No.103003763

>>103003729
Schizo.

Anonymous
10/28/24(Mon)11:03:26 No.103003788

Anonymous 10/28/24(Mon)11:03:26 No.103003788

File: getwellsoon.png (187 KB, 941x845)

187 KB PNG

Anonymous
10/28/24(Mon)11:03:54 No.103003796

Anonymous 10/28/24(Mon)11:03:54 No.103003796

>>103001193
Fuck you.

Anonymous
10/28/24(Mon)11:03:59 No.103003798

Anonymous 10/28/24(Mon)11:03:59 No.103003798

>>103003763
j-e-w

Anonymous
10/28/24(Mon)11:04:23 No.103003804

Anonymous 10/28/24(Mon)11:04:23 No.103003804

>>102999437
I wanted to do something like this for a long time. Is it open source?

Anonymous
10/28/24(Mon)11:04:37 No.103003808

Anonymous 10/28/24(Mon)11:04:37 No.103003808

>>103003745
>What's the difference with "remotely"?
I mean for a single-user local scenario. Multi character rp or maybe multiple solutions with different samplers for review etc
batching is obviously useful in multi-user provider type scenarios
>Like making synth datasets and that sort of stuff
Yeah, like that but more local-specific niche stuff

Anonymous
10/28/24(Mon)11:06:47 No.103003822

Anonymous 10/28/24(Mon)11:06:47 No.103003822

>>103003808
Maybe to process the prompt in parallel with the running chat to do stuff like summarization?
I bet that it could be leveraged for some real time shit like talking to multiple cahracters "online" and the like.

Anonymous
10/28/24(Mon)11:08:24 No.103003843

Anonymous 10/28/24(Mon)11:08:24 No.103003843

>>103003732
Hopefully he gets bored before he removes all samplers, hopefully

>I rarely stick to one idea for more than a few months
>https://github.com/p-e-w
>https://worldwidemann.com/about/

Anonymous
10/28/24(Mon)11:12:39 No.103003886

Anonymous 10/28/24(Mon)11:12:39 No.103003886

>>103002916
>Because I can run 405b locally.
I've got 256GB of DDR4 already, another 256GB is about $300. Despite having a 28-core Platinum 8280L, it's still going to be dogshit slow, right?
I've played wit 405B online, it's better than 70B, but is it worth being in the s/t gen speed range better?

Anonymous
10/28/24(Mon)11:13:19 No.103003891

Anonymous 10/28/24(Mon)11:13:19 No.103003891

>>103003788
Isn't he reinventing temperature?

Anonymous
10/28/24(Mon)11:16:03 No.103003919

Anonymous 10/28/24(Mon)11:16:03 No.103003919

>>103003798
>Philipp Emanuel (((Weidmann)))
>Jewish (Ashkenazic): artificial name from German Weide ‘pasture meadow’ + Mann ‘man’.
Oy

Anonymous
10/28/24(Mon)11:16:14 No.103003924

Anonymous 10/28/24(Mon)11:16:14 No.103003924

>>103003891
No. Temperature modifies probabilities. He's talking about measuring the 'confidence' of the generated tokens and when to stop generating.

Anonymous
10/28/24(Mon)11:16:18 No.103003925

Anonymous 10/28/24(Mon)11:16:18 No.103003925

>>103003886
>it's still going to be dogshit slow, right?
yes. I'm best-case cpumaxxing and struggle to hit 1t/s at q8. You'll likely hit 10% of that with that description of your setup.
>but is it worth...
It can be if you need frontier capabilities, but I daily deepseek. Its the best speed to capability tradeoff in my case. I only run 405b when I need to.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.