/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 07/17/24(Wed)03:26:19 No.101439122

File: CleanupCrew.png (1.94 MB, 1024x1528)

1.94 MB PNG

/lmg/ - Local Models General Anonymous 07/17/24(Wed)03:26:19 No.101439122

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101431253 & >>101421477

►News
>(07/16) Codestral Mamba, tested up to 256k context: https://hf.co/mistralai/mamba-codestral-7B-v0.1
>(07/16) MathΣtral Instruct based on Mistral 7B: https://hf.co/mistralai/mathstral-7B-v0.1
>(07/13) Llama 3 405B coming July 23rd: https://x.com/steph_palazzolo/status/1811791968600576271
>(07/09) Anole, based on Chameleon, for interleaved image-text generation: https://hf.co/GAIR/Anole-7b-v0.1
>(07/07) Support for glm3 and glm4 merged into llama.cpp: https://github.com/ggerganov/llama.cpp/pull/8031

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
07/17/24(Wed)03:27:05 No.101439126

Anonymous 07/17/24(Wed)03:27:05 No.101439126

File: 1671214888959931.gif (437 KB, 500x483)

437 KB GIF

►Recent Highlights from the Previous Thread: >>101431253

--Paper: Mixture of A Million Experts: >>101431545 >>101431583 >>101431611 >>101431637 >>101432015 >>101432203 >>101432320 >>101431838 >>101431631 >>101431757 >>101432525
--Fine-tuning a Language Model to Generate Personalized Cover Letters: Seeking Recommendations and Exploring Alternatives: >>101436669 >>101436713 >>101436900
--AMD Promotes CPUmaxxing with EPYC Genoa: Outlined GPU Performance in LLM Tasks: >>101433552
--Uranium has 82 protons, but typos and sampler settings can confuse LLMs: >>101434145 >>101434215 >>101434236 >>101434265 >>101434332 >>101434423 >>101434467 >>101434789
--Question about PSU lines and GPU power requirements: >>101438498 >>101438516 >>101438541 >>101438675 >>101438873 >>101438790 >>101438988
--SCALE Programming Language and its support for llama.cpp: >>101432707 >>101432797
--Llama 3 Finetune Tops BFCL Leaderboard, But Are Function-Calling Models a Meme?: >>101434816 >>101434850 >>101434865 >>101434884
--Lack of Development Discussion and Frustration with LLM Dominance: >>101434645 >>101434734 >>101434859 >>101434939 >>101435320 >>101435452 >>101435571
--Miku (free space): >>101431341

►Recent Highlight Posts from the Previous Thread: >>101431260

Anonymous
07/17/24(Wed)03:46:06 No.101439243

Anonymous 07/17/24(Wed)03:46:06 No.101439243

Worship the Miku

Anonymous
07/17/24(Wed)03:54:18 No.101439308

Anonymous 07/17/24(Wed)03:54:18 No.101439308

File: hatsune_miku_at_cs2.jpg (129 KB, 1024x576)

129 KB JPG

>>101439126
>Uranium has 82 protons
Shrinkflation has finally reached the nuclear energy industry.

Anonymous
07/17/24(Wed)03:56:16 No.101439320

Anonymous 07/17/24(Wed)03:56:16 No.101439320

File: ComfyUI-2024-07-16-164008(...).png (1.15 MB, 1024x1024)

1.15 MB PNG

>>101439243
Yes.

Anonymous
07/17/24(Wed)03:57:55 No.101439327

Anonymous 07/17/24(Wed)03:57:55 No.101439327

Is there any chance to get decent results with a rtx 2070?
I tried some llama V3 model yesterday at at most I could get 1 liner replies. Escaping Claude's grasp isn't that easy it seems like. Didn't tinker with the settings in oogabooga yet, however.

Anonymous
07/17/24(Wed)03:59:37 No.101439348

Anonymous 07/17/24(Wed)03:59:37 No.101439348

>>101439327
no

Anonymous
07/17/24(Wed)03:59:44 No.101439349

Anonymous 07/17/24(Wed)03:59:44 No.101439349

What's the most affordable GPU for achieving 80+ T/s on Gemma-2 9b 5bpw in a headless server? Are there any AMD cards capable of doing it?

Anonymous
07/17/24(Wed)04:00:13 No.101439355

Anonymous 07/17/24(Wed)04:00:13 No.101439355

>>101439327
(Laughs in 2060 6GB)

Anonymous
07/17/24(Wed)04:00:20 No.101439356

Anonymous 07/17/24(Wed)04:00:20 No.101439356

>>101439327
say that you want long and detailed replies in the system prompt

Anonymous
07/17/24(Wed)04:01:42 No.101439368

Anonymous 07/17/24(Wed)04:01:42 No.101439368

>>101439355
Damn. What are you using?

Anonymous
07/17/24(Wed)04:04:53 No.101439396

Anonymous 07/17/24(Wed)04:04:53 No.101439396

>>101439368
gemma-2-9b-it.Q4_K and mixtral-8x7b-v0.1.Q4_K_M
Like >>101439356 said the system prompt helps.

Anonymous
07/17/24(Wed)04:11:27 No.101439429

Anonymous 07/17/24(Wed)04:11:27 No.101439429

File: Screenshot_20240717_161059.png (406 KB, 1342x1189)

406 KB PNG

>>101439396 (me)

Anonymous
07/17/24(Wed)04:12:31 No.101439438

Anonymous 07/17/24(Wed)04:12:31 No.101439438

>>101439327
you have 8GB of VRAM, you can fit a 8B model easily, look for the gemma and llama3 smaller versions and find a gguf quant that you can run.

Anonymous
07/17/24(Wed)04:13:22 No.101439447

Anonymous 07/17/24(Wed)04:13:22 No.101439447

>>101431757
Weren't MoEs cheaper to train?
>>101431838
You could use a lot of RAM or even terabytes on storage on SSDs and the inference would be still fast. CPUmaxx is the way to go.

Also that recent scaling paper claimed MoEs need only 1.3 more parameters to contain the same amount of information as dense models

Anonymous
07/17/24(Wed)04:13:24 No.101439448

Anonymous 07/17/24(Wed)04:13:24 No.101439448

>>101439429
Damn, not bad. Guess I need to figure out this shit some more. Thanks.

Anonymous
07/17/24(Wed)04:15:03 No.101439456

Anonymous 07/17/24(Wed)04:15:03 No.101439456

>>101439438
You think it would be possible to cram an 11b model in somehow?

Anonymous
07/17/24(Wed)04:17:21 No.101439468

Anonymous 07/17/24(Wed)04:17:21 No.101439468

>>101439448
This is the system prompt I use (among others).
https://huggingface.co/datasets/ChuckMcSneed/various_RP_system_prompts/blob/main/unknown-simple-proxy-for-tavern.txt

Anonymous
07/17/24(Wed)04:17:57 No.101439471

Anonymous 07/17/24(Wed)04:17:57 No.101439471

File: 1514830956221.png (1.15 MB, 1001x1200)

1.15 MB PNG

i cannot fucking handle this youtube slop dude
im running the docker version of memgpt (fuckin wish i could use it with silly tavern instead) and all the tutorials for making it run with LLM's are for the CMD version instead of docker so im fuck outta luck
any advice? i tried doing kobolds api key in place of the open ai key spot in the .env (formerly .env.example) which made it finally open to the memgpt.localhost but then when I send a message it just thinks and then dies
I'll try and get a screen shot for ya one sec

Anonymous
07/17/24(Wed)04:27:32 No.101439530

Anonymous 07/17/24(Wed)04:27:32 No.101439530

File: 1657345609038.png (587 KB, 719x713)

587 KB PNG

>>101439027
Then recommend me a new model, faggit.

Anonymous
07/17/24(Wed)04:28:25 No.101439537

Anonymous 07/17/24(Wed)04:28:25 No.101439537

File: ITSHAPPENING.webm (588 KB, 1024x1024)

588 KB WEBM

>Particularly, the synergy of BitNet b1.58 and Q-Sparse (can be equipped with MoE) provides the cornerstone and a clear path to revolutionize the efficiency, including cost and energy consumption, of future LLMs.

Anonymous
07/17/24(Wed)04:30:44 No.101439556

Anonymous 07/17/24(Wed)04:30:44 No.101439556

>>101439456
Yes, an 11B gguf at Q4 will fit. A Q5 will depend on the context. Check this for reference:
https://huggingface.co/mradermacher/Fimbulvetr-11B-v2.1-16K-i1-GGUF

Anonymous
07/17/24(Wed)04:31:43 No.101439573

Anonymous 07/17/24(Wed)04:31:43 No.101439573

File: asry.png (446 KB, 1280x720)

446 KB PNG

>>101439471
top is what it's stuck doing then boom it stops thinking and i get nothing
the CMD is just what it looks like immediately after connecting to memgpt.localhost

Anonymous
07/17/24(Wed)05:28:19 No.101439990

Anonymous 07/17/24(Wed)05:28:19 No.101439990

File: 448808774_867367808769058(...).png (677 KB, 664x664)

677 KB PNG

i was just catching up on the last couple threads and noticed this appeared 2 threads back, didnt get any replies and didnt get caught in the recap that i can see, so gonna repost it
>>101426978
Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
https://arxiv.org/abs/2407.10969
>We introduce, Q-Sparse, a simple yet effective approach to training sparsely-activated large language models (LLMs). Q-Sparse enables full sparsity of activations in LLMs which can bring significant efficiency gains in inference. This is achieved by applying top-K sparsification to the activations and the straight-through-estimator to the training. The key results from this work are, (1) Q-Sparse can achieve results comparable to those of baseline LLMs while being much more efficient at inference time; (2) We present an inference-optimal scaling law for sparsely-activated LLMs; (3) Q-Sparse is effective in different settings, including training-from-scratch, continue-training of off-the-shelf LLMs, and finetuning; (4) Q-Sparse works for both full-precision and 1-bit LLMs (e.g., BitNet b1.58). Particularly, the synergy of BitNet b1.58 and Q-Sparse (can be equipped with MoE) provides the cornerstone and a clear path to revolutionize the efficiency, including cost and energy consumption, of future LLMs.
from the bitnet team. seems it didn't get posted here yet

>from the bitnet team. seems it didn't get posted here yet

Anonymous
07/17/24(Wed)05:31:29 No.101440013

Anonymous 07/17/24(Wed)05:31:29 No.101440013

File: 67562011_699721293801449_(...).jpg (71 KB, 959x958)

71 KB JPG

>up to 15 characters made on local setup
>all of them are just various fluff around my poorly hidden breeding kink
i need new material, there's only so many ways i can spin scenarios before i have to get into weird shit

Anonymous
07/17/24(Wed)05:36:21 No.101440042

Anonymous 07/17/24(Wed)05:36:21 No.101440042

>>101439990
According to this paper, BitNet models have an optimal sparsity of about 60% so the improvement is significant but not groundbreaking.

MoE models can also be considered "sparse" and if you can have a model with a million experts like that other Google paper claims, with an optimal number of active experts being in the hundreds, inference can be **orders of magnitude** faster than with current models or any future BitNet model.

Anonymous
07/17/24(Wed)05:36:25 No.101440043

Anonymous 07/17/24(Wed)05:36:25 No.101440043

>>101440013
All roads lead to Rome.
"New material" is either just different preludes to the same kink moment, or you need to find in you a different kink to be drawn toward.

The end result—squirt squirt—is the same no matter how you get there. Unless you change that intention, it's all just variations on a theme.

So why not ask your LLM to come up with some new material for you? Literally, at the end of a role play ask it for new ideas. If your context is large enough for it to see what you've done, it might come up with some neat new stuff.

Anonymous
07/17/24(Wed)05:40:46 No.101440064

Anonymous 07/17/24(Wed)05:40:46 No.101440064

File: never ever.png (549 KB, 1510x856)

549 KB PNG

>>101439990

Anonymous
07/17/24(Wed)05:49:13 No.101440110

Anonymous 07/17/24(Wed)05:49:13 No.101440110

>>101440064
If Meta wants to safety themselves into irrelevence, that's their perogative. Waste of their H100 farm, but the Qwen team has already shown interest in BitNet.

Anonymous
07/17/24(Wed)05:52:00 No.101440134

Anonymous 07/17/24(Wed)05:52:00 No.101440134

File: file.png (19 KB, 967x162)

19 KB PNG

>>101439990
interestingly they say in here that off the shelf LLMs can be "continue-trained" to make use of Q-sparse

Anonymous
07/17/24(Wed)05:52:06 No.101440136

Anonymous 07/17/24(Wed)05:52:06 No.101440136

>>101440043
i don't even really see it as a "kink", and i hate that word anyway
i've always been a boring vanillafag
i might try your suggestion though

Anonymous
07/17/24(Wed)05:53:05 No.101440147

Anonymous 07/17/24(Wed)05:53:05 No.101440147

File: file.png (123 KB, 962x703)

123 KB PNG

>>101440134
this was for mistral-7b

Anonymous
07/17/24(Wed)05:59:04 No.101440187

Anonymous 07/17/24(Wed)05:59:04 No.101440187

>>101440110
They mean risk instability of training runs and wasting millions on experiments that will be only good for proving that mamba/SSM fail to scale. Not safetyism risk.

Anonymous
07/17/24(Wed)05:59:47 No.101440196

Anonymous 07/17/24(Wed)05:59:47 No.101440196

>>101440136
>i don't even really see it as a "kink"
It was your word choice, so maybe you do.

>i've always been a boring vanillafag
I could describe myself the same way, but I know where my kinks lie and LLM has been an interesting way to see exactly which details of their topics "work" and which don't.

But definitely let the LLM try things. Some of the most interesting RPs I've had were by playing through a part that was a "nah" and then it went someplace I would never have thought of. And then it's hot till the context fills causing sudden derp and collapse and sadness.

Anonymous
07/17/24(Wed)06:00:33 No.101440202

Anonymous 07/17/24(Wed)06:00:33 No.101440202

>>101440187
Semantics. Meta has had over a year and failed to innovate at all.

Anonymous
07/17/24(Wed)06:01:06 No.101440208

Anonymous 07/17/24(Wed)06:01:06 No.101440208

https://www.semianalysis.com/p/gb200-hardware-architecture-and-component
neat

Anonymous
07/17/24(Wed)06:54:10 No.101440609

Anonymous 07/17/24(Wed)06:54:10 No.101440609

>>101440110
>If Meta wants to safety themselves into irrelevence, that's their perogative. Waste of their H100 farm, but the Qwen team has already shown interest in BitNet.
I think the first non meme BitNet model will be from the Mistral team, they're the only one making non transformers models (they made a MoE model and a mamba model)

Anonymous
07/17/24(Wed)06:54:47 No.101440611

Anonymous 07/17/24(Wed)06:54:47 No.101440611

File: SubvertedDemocracy.jpg (31 KB, 640x708)

31 KB JPG

Sup /lmg/. I'm looking for an open source project that allows me to have a local server with:

1- An OpenAI compatible API

2- Allows for multi-user or multi request. Ideally it won't run them in parallel, but will queue them (i.e. at any time it will be running inference for only one request but won't crash or reject requests while running inference)

3- allows for multiple models but doesn't load more than one at any single time.

4- ideally, it would seamlessly "hotswap" models as requested. (If a new request needs a different model, it will automatically unload the current model and load the new one.

Llama-cpp-python allows for all of the above except the #2. I want to have a single LLM server and use it for multiple client apps.

Pic unrelated.

Anonymous
07/17/24(Wed)07:08:59 No.101440724

Anonymous 07/17/24(Wed)07:08:59 No.101440724

>>101440043
How do you ask for that stuff?

Anonymous
07/17/24(Wed)07:12:40 No.101440750

Anonymous 07/17/24(Wed)07:12:40 No.101440750

>>101440043
at some point I think anons will need to do their own preference optimization dataset based on their fetishes to then finetune models with. maybe make a google form or something to then click through that generates the dataset when you're done

Anonymous
07/17/24(Wed)07:32:17 No.101440898

Anonymous 07/17/24(Wed)07:32:17 No.101440898

>>101440611
ollama can do all that. But, if you want really good multi users shit, you have to use vllm.

Anonymous
07/17/24(Wed)07:36:57 No.101440928

Anonymous 07/17/24(Wed)07:36:57 No.101440928

>>101440724
Pursuant to >>101440013, I tried,
>Someone on 4chan laments that he's made 15 character definitions for his LLM to role play as, but they all play into his breeding kink so narrowly that they're becoming repetitive and requiring him to get into "weird shit" to make them interesting. Please list seven kinds of characters that his LLM could role play as that offer something interesting to explore while still ultimately becoming a scenario where his character will begin producing offspring with his LLM's character. Consider all kinds and genres of fiction for ideas of what kinds of people or things these role play partner characters could be.

Removing the explanations to save space in one post, it offered:
1. Space Colonist
2. Ancient God/Goddess
3. Time Traveler
4. Shapeshifter
5. AI Entity
6. Nature Spirit
7. Alien Hybrid

Those sound like good ways to make both the run up to the kink and the consequences of the kink have something fresh to offer.

>>101440750
>their own preference optimization dataset based on their fetishes
That sounds like the road to boredom. If you move toward what you know you want, you'll get the same things again and again, as >>101440013 complained about. Guide the AI away from your no-gos and dealbreakers, and move toward things you don't have much of an opinion of, and you'll be able to find new things that you didn't know you would like. And it'll be trivial to work in a personal kink on the fly when it's wanted.

Anonymous
07/17/24(Wed)07:44:11 No.101440991

Anonymous 07/17/24(Wed)07:44:11 No.101440991

>Added --unpack, a new self-extraction feature that allows KoboldCpp binary releases to be unpacked into an empty directory. This allows easy modification and access to the files and contents embedded inside the PyInstaller. Can also be used in the GUI launcher.
Holy heckin based

Anonymous
07/17/24(Wed)07:51:14 No.101441052

Anonymous 07/17/24(Wed)07:51:14 No.101441052

What do you do after shooting your load in rp context?
Close chat and start a new one?

Anonymous
07/17/24(Wed)07:58:54 No.101441108

Anonymous 07/17/24(Wed)07:58:54 No.101441108

>>101440611
>>101440898
Forgot to add...

* Partial GPU/CPU offloading

Anonymous
07/17/24(Wed)08:30:19 No.101441368

Anonymous 07/17/24(Wed)08:30:19 No.101441368

you're going to be able to make videos that look real

Anonymous
07/17/24(Wed)08:31:52 No.101441380

Anonymous 07/17/24(Wed)08:31:52 No.101441380

>>101441368
people be questionin the legitimacy of livestreamed, realtime footage
if that can be fake then there's no such thing as real

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/17/24(Wed)08:35:01 No.101441409

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/17/24(Wed)08:35:01 No.101441409

>>101441108
If you need CPU+GPU hybrid inference llama.cpp/ggml is you only choice in terms of backend.
The llama.cpp HTTP server has an OAI-compatible API and will queue requests by default but model hot-swapping is not implemented.
Ooba I think lets you load/unload models via the API but I don't know if it's OAI-compatible.

Anonymous
07/17/24(Wed)08:44:16 No.101441474

Anonymous 07/17/24(Wed)08:44:16 No.101441474

So are we just never going to get an HF version of mamba codestral that works outside of Mistral's shitty basic bitch backend?

Anonymous
07/17/24(Wed)08:48:00 No.101441511

Anonymous 07/17/24(Wed)08:48:00 No.101441511

>>101441409
ollama queue by default and can do parallel requests, can easily swap models via API and is OAI compatible.

Anonymous
07/17/24(Wed)08:48:07 No.101441512

Anonymous 07/17/24(Wed)08:48:07 No.101441512

>>101441474
>mamba
>works

Anonymous
07/17/24(Wed)08:53:58 No.101441574

Anonymous 07/17/24(Wed)08:53:58 No.101441574

>>101441409
>The llama.cpp HTTP server has an OAI-compatible API

Last I checked, only the chat API is OAI compatible, the regular text completion isn't.

Anonymous
07/17/24(Wed)08:58:00 No.101441601

Anonymous 07/17/24(Wed)08:58:00 No.101441601

>>101441512
mamba is a supported model-type since like transformers 4.39
Why shouldn't it work?

Anonymous
07/17/24(Wed)08:58:42 No.101441604

Anonymous 07/17/24(Wed)08:58:42 No.101441604

>>101441511
ollama just runs the llama.cpp server in the background, petra.

Anonymous
07/17/24(Wed)09:06:03 No.101441654

Anonymous 07/17/24(Wed)09:06:03 No.101441654

>>101441601
okay
use it

Anonymous
07/17/24(Wed)09:08:35 No.101441679

Anonymous 07/17/24(Wed)09:08:35 No.101441679

>>101441654
Well I can't because I'm not going to install Mistral or Mamba's shitty inference packages since neither of them seem to have an API which renders them utterly fucking useless.

Anonymous
07/17/24(Wed)09:33:04 No.101441934

Anonymous 07/17/24(Wed)09:33:04 No.101441934

>>101441679
well then it doesn't work does it

Anonymous
07/17/24(Wed)09:36:39 No.101441976

Anonymous 07/17/24(Wed)09:36:39 No.101441976

There's been websim for a few months, seems quite popular now. I never cared much to try it, is it actually great? And are they or something similar open source? Official website wants me to login with google. Can you run it with a local model?

Anonymous
07/17/24(Wed)09:45:16 No.101442061

Anonymous 07/17/24(Wed)09:45:16 No.101442061

so I installed ollama cli and run lama3
its pretty cool but i hoped /save sessionName save the chat so I can come back to it next time with llama remembering what was told, but apparently it is not the case
how to save/load chat history so i can continue the conversation?

Anonymous
07/17/24(Wed)09:49:37 No.101442109

Anonymous 07/17/24(Wed)09:49:37 No.101442109

>>101442061
Wrong site: reddit.com/r/LocalLLaMA/

Anonymous
07/17/24(Wed)09:50:16 No.101442112

Anonymous 07/17/24(Wed)09:50:16 No.101442112

>>101441934
For once, doomer Anon, on this one exact topic, you are correct. It is unironically over and there's absolutely no hope.

Anonymous
07/17/24(Wed)09:59:54 No.101442219

Anonymous 07/17/24(Wed)09:59:54 No.101442219

>>101442109
sorry, I thought this was local models general

Anonymous
07/17/24(Wed)10:02:49 No.101442244

Anonymous 07/17/24(Wed)10:02:49 No.101442244

>>101442219
ollama is not beloved

Anonymous
07/17/24(Wed)10:03:30 No.101442253

Anonymous 07/17/24(Wed)10:03:30 No.101442253

Anybody knows if the whole slot system from llama-server works with Sily, or would I need to change the way Silly calls the API to specify a slot or something of the sort?
You can have different prompt caches per slot right? That would be pretty cool when switching between cards or even when using things like the summary extension.

Anonymous
07/17/24(Wed)10:04:53 No.101442269

Anonymous 07/17/24(Wed)10:04:53 No.101442269

>>101440202
>>101440187
I think it was too early to get their clusters fully online, and they're still in the process of getting and building more. They began training and setting in stone what Llama 3 was going to be earlier than some of the current hyped research could prove scalable. Llama 4 is probably going to be the one with a more unique architecture.

However, it is pretty normal that a big corporation lags behind startups when it comes to smaller scale faster paced roll outs of technology. Their advantage is that when they do roll out a new product, it's done with more money. That doesn't always equal a better product. In the case of LLMs it means they can spend a lot of time training the model like 15T, or they can train a 400B dense, or whatever. Startups may come out with a Bitnet or Jamba or whatever sooner, but then it'll take a megacorp to produce a Bitnet or Jamba or whatever with 15T pumped into one, or a 400B, or etc.

Anonymous
07/17/24(Wed)10:06:41 No.101442286

Anonymous 07/17/24(Wed)10:06:41 No.101442286

>>101442061
Maybe ollama has updated and fixed this since I used it a month or two ago, but I found the /save feature to be total ass. The parser was busted and many character sequences would kill the parser, causing it to save only the first few turns of the conversation.

I could avoid then on my side (for example, NEVER end a turn with ". A space or spare period after would be fine but if it ended on a quote mark from dialog, that's the end of the save. Of course, the AI could write killing sequences, too, so even if I'm careful it's a doomed chat.

I roll Kobold now. Much easier to adjust settings now that I know them, I don't have to play Ollama's silly JSON and renamed file game, and I can use (almost) all of the models.

Ollama is a great introduction but the instant you want to do more, step up to a better wrapper.

Anonymous
07/17/24(Wed)10:07:34 No.101442296

Anonymous 07/17/24(Wed)10:07:34 No.101442296

>>101442269
so they allow training on the output? If so that would probably be great for Mistral and similar

Anonymous
07/17/24(Wed)10:21:45 No.101442409

Anonymous 07/17/24(Wed)10:21:45 No.101442409

>>101439126
Isn't Mixture of a Million Experts basically Marvin Minsky's Society of Mind? Also with the fact that it could theorhetical have life long learning...

Bros, we're actually going to get lifelike ai gfs in our life times, aren't we?

Anonymous
07/17/24(Wed)10:24:40 No.101442432

Anonymous 07/17/24(Wed)10:24:40 No.101442432

so can anyone actually even test the 405b? by maybe renting some gpus online or something?

Anonymous
07/17/24(Wed)10:26:30 No.101442440

Anonymous 07/17/24(Wed)10:26:30 No.101442440

>>101442432
I believe someone from huggingface said they'd host it (though I'm quite confused that the hugging.chat models are always broken)

Anonymous
07/17/24(Wed)10:27:05 No.101442443

Anonymous 07/17/24(Wed)10:27:05 No.101442443

>>101442286
thank you for an actual answer
kobold seems interesting but i wanted something working in terminal for now
I guess I will try some ollama terminal clients, I see there are quite a few
anyone can recommend specific one?

Anonymous
07/17/24(Wed)10:27:32 No.101442446

Anonymous 07/17/24(Wed)10:27:32 No.101442446

>>101442432
If any providers bother with it it will probably be on open router but for $2 a gen I imagine

Anonymous
07/17/24(Wed)10:32:44 No.101442485

Anonymous 07/17/24(Wed)10:32:44 No.101442485

>>101442443
I too enjoyed that Ollama was terminal based at first, and it was something I wanted when getting started. But then I wanted to change settings without curling one line JSON strings, didn't like the multi line text bug, the inability to save, etc, so I switched to Kobold and now it's trivial for me to reach it from my phone when I step away from my computer, I have access to important settings, state saving works correctly, and easy access to editing the document to fix errors or to reroll a response is very nice to have.

Anonymous
07/17/24(Wed)10:36:45 No.101442516

Anonymous 07/17/24(Wed)10:36:45 No.101442516

>>101439447
>Weren't MoEs cheaper to train?
Yeah, generally. Wasn't necessarily true for this "distributed" FFW method though. Looking at the paper though it should be more flop efficient. I'm actually warming up to the concept as I dig into it a bit more, there might be something really good here.

This paper alone isn't enough to sell the idea, but I think a few minor improvements and suddenly this becomes the new training paradigm.

Anonymous
07/17/24(Wed)10:36:49 No.101442517

Anonymous 07/17/24(Wed)10:36:49 No.101442517

>>101442219
Yeah, this is not the ollama tech support general. Go to Reddit to shill your scam.

Anonymous
07/17/24(Wed)10:47:53 No.101442599

Anonymous 07/17/24(Wed)10:47:53 No.101442599

state space 1 million expert bitnet 1T parameter model when?

Anonymous
07/17/24(Wed)10:49:15 No.101442605

Anonymous 07/17/24(Wed)10:49:15 No.101442605

>>101442599
Don't forget with JEPA and native multimodal and multitoken prediction.

Anonymous
07/17/24(Wed)10:51:01 No.101442618

Anonymous 07/17/24(Wed)10:51:01 No.101442618

>>101442599
Bitnet probably wouldn't work well with experts in the order of thousands of parameters in size.

Anonymous
07/17/24(Wed)10:51:15 No.101442621

Anonymous 07/17/24(Wed)10:51:15 No.101442621

>>101442599
we need to return to good old lstms

Anonymous
07/17/24(Wed)11:02:39 No.101442697

Anonymous 07/17/24(Wed)11:02:39 No.101442697

mambeleonbytejepabitnetMoaME 600B when

Anonymous
07/17/24(Wed)11:05:11 No.101442712

Anonymous 07/17/24(Wed)11:05:11 No.101442712

>>101442599
with fill in the middle support

Anonymous
07/17/24(Wed)11:06:36 No.101442725

Anonymous 07/17/24(Wed)11:06:36 No.101442725

>in my system prompt I give the narrator a personality, since that decreases slop
>after plapping the character from the card, ask what the narrator thinks so far using OOC
>she's horny
>propose her, herself joining on the fun
>now I get to plap the original character as well as the narrator turned into a character
Let's fucking go.

Anonymous
07/17/24(Wed)11:07:00 No.101442729

Anonymous 07/17/24(Wed)11:07:00 No.101442729

File: Screenshot 2024-07-17 at (...).png (42 KB, 480x810)

42 KB PNG

llama.cpp has support for some NPU now.

>>101442621
Wouldn't it be hilarious if the next step is to go back but with a couple of adjustments and at a bigger scale?
It's not even absurd to think that since that kind of thing happens all the time.

Anonymous
07/17/24(Wed)11:07:48 No.101442735

Anonymous 07/17/24(Wed)11:07:48 No.101442735

>>101442697
I'll type the weights by hand

Anonymous
07/17/24(Wed)11:09:01 No.101442750

Anonymous 07/17/24(Wed)11:09:01 No.101442750

https://github.com/ggerganov/llama.cpp/pull/8543
>Add support for Chameleon #8543
that one anon will be happy
>For now, this implementation only supports text->text inference and serves as base to implement the (more interesting) image->text, text->image and interleaved pipelines. However, such an implementation will probably require some changes to the CLI and internal architecture, so I suggest to do this in a separate PR.
oh...

Anonymous
07/17/24(Wed)11:09:34 No.101442753

Anonymous 07/17/24(Wed)11:09:34 No.101442753

>>101442599
right after you hang yourself.

Anonymous
07/17/24(Wed)11:09:38 No.101442754

Anonymous 07/17/24(Wed)11:09:38 No.101442754

>>101442729
Interesting. Hopefully now we can get some benchmarks of how those new laptops do with LLMs.

Anonymous
07/17/24(Wed)11:12:45 No.101442792

Anonymous 07/17/24(Wed)11:12:45 No.101442792

>>101442729
isn't there any standard NPU api?

Anonymous
07/17/24(Wed)11:13:45 No.101442804

Anonymous 07/17/24(Wed)11:13:45 No.101442804

>>101442754
they've still got poor memory bandwidth compared to low end nvidia gpus

Anonymous
07/17/24(Wed)11:15:21 No.101442820

Anonymous 07/17/24(Wed)11:15:21 No.101442820

>>101442792
Doubt it.
It's probably the same situation as GPUs where each vendor has its own computing architecture with its own APIs.

Anonymous
07/17/24(Wed)11:20:13 No.101442868

Anonymous 07/17/24(Wed)11:20:13 No.101442868

>>101442804
Well yeah. But low power windoze/linux laptops that can run LLMs is still pretty good to have exist. Also would be good to know how the LPDDR5X in them performs, whether it'll still be the bottleneck, or the compute is the bottleneck, in the case of these computers. We could then extrapolate to think about how a desktop with a similar chip could perform paired with separate GPU.

Anonymous
07/17/24(Wed)11:25:27 No.101442922

Anonymous 07/17/24(Wed)11:25:27 No.101442922

Why don't they just add matrix multiplication to slide rules?

Anonymous
07/17/24(Wed)11:39:56 No.101443047

Anonymous 07/17/24(Wed)11:39:56 No.101443047

>>101440064
Meta became irrelevant when they started to filter their pre-training dataset with llamaguard. Claude shows that a diverse dataset with no regards of safety produces great results. A model that could not learn anything 'unsafe' because the data it was fed was designed to be 99% safe will never be good.

Anonymous
07/17/24(Wed)11:42:22 No.101443072

Anonymous 07/17/24(Wed)11:42:22 No.101443072

>>101443047
hi petra

Anonymous
07/17/24(Wed)11:49:39 No.101443144

Anonymous 07/17/24(Wed)11:49:39 No.101443144

>>101439308
Disregarding nuclear safety regulations with Miku

Anonymous
07/17/24(Wed)11:51:21 No.101443158

Anonymous 07/17/24(Wed)11:51:21 No.101443158

File: teto.jpg (158 KB, 1024x1024)

158 KB JPG

How to restrain model's output in tabbyapi? I tried "json_schema": {"type": "string","enum": ["Yes","No","Maybe"]}, but I keep getting errors:
ERROR: ExLlamaV2Sampler.sample(
ERROR: File "/home/petra/tabbyAPI/venv/lib/python3.10/site-packages/exllamav2/generator/sampler.py", line 247, in
sample
ERROR: assert pass_tokens, "Filter excluded all tokens"
ERROR: AssertionError: Filter excluded all tokens
ERROR: Sent to request: Completion aborted. Maybe the model was unloaded? Please check the server console.

Anonymous
07/17/24(Wed)11:58:28 No.101443202

Anonymous 07/17/24(Wed)11:58:28 No.101443202

>>101443072
Are you retarded or just schizophrenic?

Anonymous
07/17/24(Wed)12:00:21 No.101443222

Anonymous 07/17/24(Wed)12:00:21 No.101443222

>>101443047
This, this is why Command-R is good, and partially what makes WLM 8x22b wayyy better then 8x22b instruct. (Although I think WLM just has some other interesting continued pretraining and finetuned techniques up their sleeves iirc)

Anonymous
07/17/24(Wed)12:02:18 No.101443244

Anonymous 07/17/24(Wed)12:02:18 No.101443244

File: 1695520564243917.png (20 KB, 1009x382)

20 KB PNG

>>101443222
When will this meme die?

Anonymous
07/17/24(Wed)12:04:59 No.101443272

Anonymous 07/17/24(Wed)12:04:59 No.101443272

>>101443244
if you mean the huggingface leaderboard meme, whenever you stop posting it

Anonymous
07/17/24(Wed)12:08:26 No.101443304

Anonymous 07/17/24(Wed)12:08:26 No.101443304

>>101443222
>what makes WLM 8x22b wayyy better
When are the mikufags going to drop this meme? How is this related to pretraining when it's a finetune done with GPT-4 outputs?

Anonymous
07/17/24(Wed)12:13:02 No.101443349

Anonymous 07/17/24(Wed)12:13:02 No.101443349

>>101443222

They have a paper out for wizard. You can replicate if you have the compute and money.

They ran an offline arena and basically trained on the best outputs in that on something, over and over again. SFT, DPO, PPO i believe? Reward models type shit.

Anonymous
07/17/24(Wed)12:17:51 No.101443389

Anonymous 07/17/24(Wed)12:17:51 No.101443389

>>101443349

Basically.

https://x.com/victorsungo/status/1811427047341776947?t=k7ZXwSCRnYKBW_7Rj0_q6w&s=19

Anonymous
07/17/24(Wed)12:30:45 No.101443503

Anonymous 07/17/24(Wed)12:30:45 No.101443503

File: file.png (479 KB, 593x812)

479 KB PNG

Duality of /lmg/

Anonymous
07/17/24(Wed)12:37:27 No.101443568

Anonymous 07/17/24(Wed)12:37:27 No.101443568

File: 1719351514748681.jpg (575 KB, 2048x2048)

575 KB JPG

>>101443244
>Aktually according to the HF leaderboard...
>t. Never run Mistral8x22 or Wiz8x22 for himself
>>101443503
>Yes with a 5% lead
We are so back bros

Anonymous
07/17/24(Wed)12:45:09 No.101443642

Anonymous 07/17/24(Wed)12:45:09 No.101443642

>>101443158
good image

Anonymous
07/17/24(Wed)12:50:36 No.101443698

Anonymous 07/17/24(Wed)12:50:36 No.101443698

>>101443047
buy an ad

Anonymous
07/17/24(Wed)12:53:09 No.101443724

Anonymous 07/17/24(Wed)12:53:09 No.101443724

>>101439122
>>(07/13) Llama 3 405B coming July 23rd: https://x.com/steph_palazzolo/status/1811791968600576271

what am I gonna need to run this?

Anonymous
07/17/24(Wed)12:57:07 No.101443763

Anonymous 07/17/24(Wed)12:57:07 No.101443763

Will we even need to make videos anymore if we can just generate them?

Anonymous
07/17/24(Wed)13:09:12 No.101443885

Anonymous 07/17/24(Wed)13:09:12 No.101443885

>>101443763
There are artists still making art even with AI generated art. There are still writers despite AI being able to write. There will still be filmmakers despite AI being able to make film. Some people just like to create.

Anonymous
07/17/24(Wed)13:12:38 No.101443917

Anonymous 07/17/24(Wed)13:12:38 No.101443917

>>101443724
At least a raspberry pi

Anonymous
07/17/24(Wed)13:13:25 No.101443930

Anonymous 07/17/24(Wed)13:13:25 No.101443930

>>101443763
>t. the least illiterate AI bro

Anonymous
07/17/24(Wed)13:19:42 No.101444001

Anonymous 07/17/24(Wed)13:19:42 No.101444001

I have been doing some testing between GGUF and EXL2.

I have always preferred exl2 as it was faster and had Q4 cache for KV.

But the speed difference seems to have vanished, and llama.cpp supports KV caching too now.

All averaged though 3 runs+using SillyTavern as front end+using FA and Q4 cache in both exllama (mandadory in tabbyapi) and llama.cpp:

TabbiAPI Backend (Exllamav2 0.1.7):
WizardLM2 8x22B Exl2 @ 4.0bpw 24.1t/s

Llamacpp backend (pulled 2hours ago):
WizardLM2 8x22B GGUF @ Q4_K_M 25.2t/s

Textgenwebui Backend Exl2:
WizardLM2 8x22B Exl2 @ 4.0bpw 22.1t/s

Textgenwebui Backend GGUF:
WizardLM2 8x22B Exl2 @ 4.0bpw 23.2t/s

Due to the higher support for llama.cpp and thus better compatibility with devices and faster compatibility with new models. Is there any reason to still use exl2?

System:
4x3090's at 250w max
EPYC 7402

Anonymous
07/17/24(Wed)13:21:25 No.101444012

Anonymous 07/17/24(Wed)13:21:25 No.101444012

>>101444001
>>Textgenwebui Backend GGUF:
>>WizardLM2 8x22B Exl2 @ 4.0bpw 23.2t/s

I meant WizardLM2 8x22B Exl2 @ *Q4_K_M* 23.2t/s

Anonymous
07/17/24(Wed)13:28:02 No.101444083

Anonymous 07/17/24(Wed)13:28:02 No.101444083

>>101444001
Can you also provide results for prompt processing?

Anonymous
07/17/24(Wed)13:31:04 No.101444118

Anonymous 07/17/24(Wed)13:31:04 No.101444118

>>101444001
>Is there any reason to still use exl2?
Unironically no.

Anonymous
07/17/24(Wed)13:33:01 No.101444136

Anonymous 07/17/24(Wed)13:33:01 No.101444136

>>101444001
>those numbers
big if true

Anonymous
07/17/24(Wed)13:36:43 No.101444167

Anonymous 07/17/24(Wed)13:36:43 No.101444167

>>101444001
exl2sisters... what went wrong?

Anonymous
07/17/24(Wed)13:38:16 No.101444182

Anonymous 07/17/24(Wed)13:38:16 No.101444182

>>101444001
The llama.cpp server is too rough around the edges and tabbyapi is more polished. I would switch back to exllama if Gemma 2 worked as well as it does with llama.cpp.

Anonymous
07/17/24(Wed)13:50:05 No.101444302

Anonymous 07/17/24(Wed)13:50:05 No.101444302

>>101444001
Nah, I hate exllamav2 with a passion now because they keep breaking their god damn pip package

switched to llama.cpp when they increased the speed and haven't looked back

Anonymous
07/17/24(Wed)13:50:25 No.101444309

Anonymous 07/17/24(Wed)13:50:25 No.101444309

>>101443885
And there are still blacksmiths hand forging horseshoes.

What matters is if AI will completely collapse the manual art market, or if it'll be like music has been, with synths and DAWs rolling into the toolkit and letting new sounds enter the ecosystem and more artists offer their ideas.

Anonymous
07/17/24(Wed)13:56:38 No.101444380

Anonymous 07/17/24(Wed)13:56:38 No.101444380

current foss models are barely better than chatgpt 3.5, seems like fossissies lost in the end.

Anonymous
07/17/24(Wed)14:01:43 No.101444425

Anonymous 07/17/24(Wed)14:01:43 No.101444425

>>101444309
>with synths and DAWs rolling into the toolkit and letting new sounds enter the ecosystem and more artists offer their ideas.
For most commercial endeavors, I think that's what it will be.
Some market segments like commercials might end up as mostly AI generated at some point, however.

Anonymous
07/17/24(Wed)14:24:23 No.101444647

Anonymous 07/17/24(Wed)14:24:23 No.101444647

https://x.com/RuoyuSun_UI/status/1813635251652227505
Big

Anonymous
07/17/24(Wed)14:33:44 No.101444738

Anonymous 07/17/24(Wed)14:33:44 No.101444738

>>101444647
"gradually overlap" aka it's not as good
if it's more stable than SGD (and it certainly seems that way from the graph) it's definitely an option for people without a lot of memory aka local stuff

Anonymous
07/17/24(Wed)14:34:01 No.101444740

Anonymous 07/17/24(Wed)14:34:01 No.101444740

Does ST not get token probabilities with Llama.cpp? I checked the box and went into the token probabilities tab but nothing is appearing in it.

Anonymous
07/17/24(Wed)14:34:16 No.101444743

Anonymous 07/17/24(Wed)14:34:16 No.101444743

>>101444380
For many tasks gemma 27b feels just as good as GPT4-o

Anonymous
07/17/24(Wed)14:35:18 No.101444756

Anonymous 07/17/24(Wed)14:35:18 No.101444756

>>101444083

gguf vs exl2 anon here

GGUF:

1st run: prompt eval time = 742.80 ms / 380 tokens ( 1.95 ms per token, 511.58 tokens per second)
generation eval time = 19797.38 ms / 500 runs ( 39.59 ms per token, 25.26 tokens per second)

2nd run: prompt eval time = 157.15 ms / 1 tokens ( 157.15 ms per token, 6.36 tokens per second)
generation eval time = 19793.48 ms / 500 runs ( 39.59 ms per token, 25.26 tokens per second)

(restarted llama.cpp server)
3rd run: prompt eval time = 642.11 ms / 380 tokens ( 1.69 ms per token, 591.80 tokens per second)
generation eval time = 19772.64 ms / 500 runs ( 39.55 ms per token, 25.29 tokens per second)

EXL2:

1st run: Metrics: 500 tokens generated in 21.56 seconds (Queue: 0.0 s, Process: 0 cached tokens and 380 new tokens at 315.16 T/s,
Generate: 24.57 T/s, Context: 380 tokens)

2nd swipe: Metrics: 500 tokens generated in 20.29 seconds (Queue: 0.0 s, Process: 379 cached tokens and 1 new tokens at 13.75 T/s,
Generate: 24.73 T/s, Context: 380 tokens)

(restarted tabby to avoid caching)
3rd swipe: Metrics: 500 tokens generated in 21.55 seconds (Queue: 0.0 s, Process: 0 cached tokens and 380 new tokens at 314.84 T/s,
Generate: 24.58 T/s, Context: 380 tokens)

So... llama.cpp is also faster in prompt processing. I have noticed it as the time to first token feels quicker in llama.cpp

Note: llama.cpp server also loads the model faster. I don't know why, seems to load the model in parallel in each 3090 while exllama loads it in series and waits for each one to be filled.

Im a turboderp fanboy and it was until today that I assumed that exl2 was always better. But it seems that llama.cpp has made so much progress

>>101444182
is it?

by tabbyapi do you mean exllamav2? the tabbyapi auther says in his readme that tabby is not producction ready and we should use aphrodite. But to be honest tabby+exllama has been rock solid for me.

Anonymous
07/17/24(Wed)14:36:36 No.101444771

Anonymous 07/17/24(Wed)14:36:36 No.101444771

>>101444302
I haven't had any pip problem to be honest. I use python envs though. Just following the readme and using the whl works fine for me.

Anonymous
07/17/24(Wed)14:37:41 No.101444786

Anonymous 07/17/24(Wed)14:37:41 No.101444786

>>101441409
what's your stance on SCALE
https://docs.scale-lang.com/

Anonymous
07/17/24(Wed)14:40:06 No.101444819

Anonymous 07/17/24(Wed)14:40:06 No.101444819

>>101444771
I have a shared env for the experiments I use, and twice now installing a new package that updated the exllamav2 package broke it in a non-obvious way. They adjust how parameters are handled too often, and it quietly just breaks generation. Queue tens of hours finding out why my pipeline is randomly breaking, because sometimes it still outputs decent stuff.

Anonymous
07/17/24(Wed)14:43:26 No.101444866

Anonymous 07/17/24(Wed)14:43:26 No.101444866

Is it advised to have character speak about their actions in first person instead of third?
Like instead of *{char} smiles* say *I smile*, but still in the *action* markup.
Would it solve the heavy narration bias of many models that tend to describe too much and talk too little?

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/17/24(Wed)14:46:17 No.101444898

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/17/24(Wed)14:46:17 No.101444898

>>101444786
I will be happy to cooperate for wider hardware support.
But it doesn't fix the fundamental issue that GPU performance depends very strongly on hardware details.
So something like this will never be a replacement for e.g. a dedicated ROCm implementation.
Also compared to HIP it will not be possible to make informed decisions regarding which kernel (configurations) should be used when running on AMD.

Anonymous
07/17/24(Wed)14:46:31 No.101444904

Anonymous 07/17/24(Wed)14:46:31 No.101444904

>>101443158
Marvelous, "grammar_string" doesn't work as well
ERROR: File "/home/petra/tabbyAPI/endpoints/OAI/utils/completion.py", line 135, in stream_generate_completion
ERROR: raise generation
ERROR: File "/home/petra/tabbyAPI/endpoints/OAI/utils/completion.py", line 87, in _stream_collector
ERROR: async for generation in new_generation:
ERROR: File "/home/petra/tabbyAPI/backends/exllamav2/model.py", line 1070, in generate_gen
ERROR: grammar_handler.add_ebnf_filter(grammar_string, self.model, self.tokenizer)
ERROR: File "/home/petra/tabbyAPI/backends/exllamav2/grammar.py", line 147, in add_ebnf_filter
ERROR: ebnf_filter = ExLlamaV2EbnfFilter(model, tokenizer, ebnf_string)
ERROR: File "/home/petra/tabbyAPI/backends/exllamav2/grammar.py", line 46, in __init__
ERROR: self.state = self.fsm.first_state
ERROR: AttributeError: 'CFGFSM' object has no attribute 'first_state'. Did you mean: 'final_state'?
ERROR: Sent to request: Completion aborted. Please check the server console.

Anonymous
07/17/24(Wed)14:47:26 No.101444911

Anonymous 07/17/24(Wed)14:47:26 No.101444911

File: 1721166662887634.png (51 KB, 1482x342)

51 KB PNG

>gemma 27b feels just as good as GPT4-o

Anonymous
07/17/24(Wed)14:54:15 No.101444971

Anonymous 07/17/24(Wed)14:54:15 No.101444971

>>101444738
>"gradually overlap" aka it's not as good
can you elaborate on that?

Anonymous
07/17/24(Wed)14:55:17 No.101444986

Anonymous 07/17/24(Wed)14:55:17 No.101444986

>>101444743
such as?

Anonymous
07/17/24(Wed)14:57:11 No.101445019

Anonymous 07/17/24(Wed)14:57:11 No.101445019

>>101444911
I just asked the character I'm talking to the exact same question (Q6_K), 1 try, she answered "No, that is incorrect. The element with atomic number 82 is lead. Uranium has the atomic number 92."

Anonymous
07/17/24(Wed)14:58:46 No.101445036

Anonymous 07/17/24(Wed)14:58:46 No.101445036

>>101444986
Actually most tasks except code, because GPT4-o outputs longer code

Anonymous
07/17/24(Wed)14:59:28 No.101445049

Anonymous 07/17/24(Wed)14:59:28 No.101445049

still trying out dry with good success but then i came across this leddit post explaining it more. turns out i should be using rep pen still, but i have it turned off atm to test dry specifically. going to leave it off a bit longer but i'll try with both.
>https://old.reddit.com/r/KoboldAI/comments/1e49vpt/dry_sampler_questionsthat_im_sure_most_of_us_are/

Anonymous
07/17/24(Wed)15:02:03 No.101445081

Anonymous 07/17/24(Wed)15:02:03 No.101445081

>>101444425
>Some market segments like commercials might end up as mostly AI generated at some point
I think the space for that is going to be for targeted advertising, which will be fully customized generated on the fly baited Skinner boxes, far beyond the "HOT SINGELS IN ${YOUR} AREA" trash. Likely it'll be a "smart" Telescreen thing where it knows your family's viewing habits and AI gens tailored inserts that it can force you to look at because you didn't support Sceptre as the last television supplier.

For mass marketing I think it'll just be another tool in the box. There's that pizza ad that screams "AI assisted" from beginning to end. A shame that it's still not as cool as Pepperoni Hug Spot but it's a sign of the times when they sell pizza with an exploding head when formerly that was the reason Scanners became a meme.

Anonymous
07/17/24(Wed)15:02:24 No.101445084

Anonymous 07/17/24(Wed)15:02:24 No.101445084

>>101445019
so I have to say "ahh ahh mistress" to make it smarter?

Anonymous
07/17/24(Wed)15:03:12 No.101445098

Anonymous 07/17/24(Wed)15:03:12 No.101445098

>>101445084
yes, that's claudes secret

Anonymous
07/17/24(Wed)15:05:01 No.101445121

Anonymous 07/17/24(Wed)15:05:01 No.101445121

>>101444756
Thanks! You've saved me a lot of time.

Anonymous
07/17/24(Wed)15:07:19 No.101445147

Anonymous 07/17/24(Wed)15:07:19 No.101445147

>>101444971
It initially doesn't learn as fast, and it's really not guaranteed to ever catch up to base adam
It simply performs worse, though with the memory savings it might be worth it in some cases

Anonymous
07/17/24(Wed)15:10:59 No.101445195

Anonymous 07/17/24(Wed)15:10:59 No.101445195

>>101445147
So I guess that's something that could be used first as a cheap method and if it's not successfull then going for regular adam, I see

Anonymous
07/17/24(Wed)15:11:08 No.101445201

Anonymous 07/17/24(Wed)15:11:08 No.101445201

>>101443158
Fuck, I solved it! It seems that exllamav2 always expects an object, so grammar like this works:
{
"type":"object",
"properties": {
"result": {
"type": "string",
"enum": ["Yes!","No?","Maybe..."]
}
},
"required": ["result"]
}

Anonymous
07/17/24(Wed)15:16:09 No.101445267

Anonymous 07/17/24(Wed)15:16:09 No.101445267

who is petra?

Anonymous
07/17/24(Wed)15:17:33 No.101445284

Anonymous 07/17/24(Wed)15:17:33 No.101445284

>>101444001
Now try vLLM.

Anonymous
07/17/24(Wed)15:18:35 No.101445295

Anonymous 07/17/24(Wed)15:18:35 No.101445295

>try Lunaris, it's shit
>try Stheno, it's shit
>try Gemma, it's shit

I fell for the vramlet "good model" meme

Anonymous
07/17/24(Wed)15:18:37 No.101445297

Anonymous 07/17/24(Wed)15:18:37 No.101445297

>>101445195
Sorry anon, I'm actually retarded. I just realized I read the graph backwards. Based on what they're saying it preforms better, though I very much doubt that's true in general. Will wait for more people to try it out, but it might be good.

Anonymous
07/17/24(Wed)15:18:41 No.101445299

Anonymous 07/17/24(Wed)15:18:41 No.101445299

>>101445267
A Sao fanboy/fangirl from the UK.

Anonymous
07/17/24(Wed)15:19:43 No.101445313

Anonymous 07/17/24(Wed)15:19:43 No.101445313

who is sao?

Anonymous
07/17/24(Wed)15:20:32 No.101445323

Anonymous 07/17/24(Wed)15:20:32 No.101445323

>>101445313
Petra's crush.

Anonymous
07/17/24(Wed)15:20:41 No.101445324

Anonymous 07/17/24(Wed)15:20:41 No.101445324

>>101442725
the only thing your are plapping is your hand, anon

Anonymous
07/17/24(Wed)15:21:37 No.101445338

Anonymous 07/17/24(Wed)15:21:37 No.101445338

>>101445295
that isn't a vramlet meme its 'i have a laptop with 4gb vram and 12gb ram' meme which covers most of the thirdworlders who post here. you can run 70b just fine with a good cpu and ram

Anonymous
07/17/24(Wed)15:22:25 No.101445349

Anonymous 07/17/24(Wed)15:22:25 No.101445349

>>101445323
pic?

Anonymous
07/17/24(Wed)15:27:32 No.101445404

Anonymous 07/17/24(Wed)15:27:32 No.101445404

>>101444866
Recommend 8-13B models for doing the actual chatting instead of narration. I'm kinda tired of every conversation sliding into the purple prose.

Anonymous
07/17/24(Wed)15:29:00 No.101445420

Anonymous 07/17/24(Wed)15:29:00 No.101445420

i have 88GB of VRAM and i want to run either https://huggingface.co/OpenGVLab/InternVL2-40B or https://huggingface.co/facebook/chameleon-30b. how? they are both multimodal models, but i dont know how to run them. do they work just fine with oobabooga or do i need some sort of specialized backend?

Anonymous
07/17/24(Wed)15:30:55 No.101445446

Anonymous 07/17/24(Wed)15:30:55 No.101445446

>>101442729
> The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. The ultimate reason for this is Moore's law, or rather its generalization of continued exponentially falling cost per unit of computation. Most AI research has been conducted as if the computation available to the agent were constant (in which case leveraging human knowledge would be one of the only ways to improve performance) but, over a slightly longer time than a typical research project, massively more computation inevitably becomes available. Seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation. These two need not run counter to each other, but in practice they tend to. Time spent on one is time not spent on the other. There are psychological commitments to investment in one approach or the other. And the human-knowledge approach tends to complicate methods in ways that make them less suited to taking advantage of general methods leveraging computation.

http://www.incompleteideas.net/IncIdeas/BitterLesson.html

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.