/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 06/18/24(Tue)00:56:38 No.101030715

File: __kasane_teto_and_kasane_(...).jpg (136 KB, 850x945)

136 KB JPG

/lmg/ - Local Models General Anonymous 06/18/24(Tue)00:56:38 No.101030715 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101021764 & >>101010179

►News
>(06/17) DeepSeekCoder-V2 released with 236B & 16B MoEs: https://github.com/deepseek-ai/DeepSeek-Coder-V2
>(06/14) Nemotron-4-340B: Dense model designed for synthetic data generation: https://hf.co/nvidia/Nemotron-4-340B-Instruct
>(06/14) Nvidia collection of Mamba-2-based research models: https://hf.co/collections/nvidia/ssms-666a362c5c3bb7e4a6bcfb9c
>(06/11) Google releases RecurrentGemma, based on a hybrid RNN architecture: https://hf.co/google/recurrentgemma-9b-it
>(06/06) Qwen2 releases, with better benchmarks than Llama 3: https://qwenlm.github.io/blog/qwen2/

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
06/18/24(Tue)00:57:28 No.101030724

Anonymous 06/18/24(Tue)00:57:28 No.101030724

File: 11__00729_.png (2.04 MB, 1024x1024)

2.04 MB PNG

►Recent Highlights from the Previous Thread: >>101021764

--Running DeepSeekCoder-V2: Challenges and Potential Solutions: >>101024205 >>101024216 >>101024302
--Command-R+ Preset Recommendations for Enhanced Text Generation: >>101022120 >>101022759
--Q#_K_S Quants Outperform Q#_K_M in Coding Question Test: Precision and Perplexity in Focus: >>101023015 >>101023134 >>101023284 >>101023816 >>101026611 >>101026657 >>101026737 >>101026769 >>101027313 >>101027404 >>101027425
--Random Word Prompt for Diverse AI Outputs and Political Discussions: >>101025282 >>101026596 >>101026629 >>101029678
--Random Guy Achieves SOTA in ARC-AGI by Spamming GPT-4o API Calls: >>101023940 >>101024054 >>101024140 >>101024409 >>101024126 >>101024451 >>101024139 >>101025187 >>101024458 >>101025703 >>101028279
--Llama3's Impact on AI Hype and Community Enthusiasm: >>101023148 >>101023311 >>101023236 >>101023435 >>101024833
--AI Hype vs. Technology: The Future Beyond the Hype: >>101023019 >>101023102 >>101023533 >>101025934
--Remembering the Wonder of Early AI Days with Pygmalion and Pre-Pyg: >>101026015 >>101026071 >>101026309 >>101026408 >>101026166 >>101026204
--Memory Access Optimization, Not AMX, Behind Improved Inference: >>101022868 >>101021771
--MARS5-TTS: A New Text-to-Speech Model from CAMB.AI: >>101025898 >>101025919 >>101026352
--LLaMA 3B Censorship and Feminist Rhetoric in Sexual Technique and Relationship Advice: >>101025780 >>101025810 >>101026691
--DeepSeek API: Affordable but Limited Creativity for CPU-Maxxers: >>101021891 >>101022113 >>101022403 >>101022519
--Building an AI VTuber: Seeking Advice on Tech and Minimum Viable Product: >>101029508 >>101029577 >>101029652 >>101029613 >>101029665 >>101030246
--Anon's Quest to Run Nemotron on Old Hardware: Challenges and Possibilities: >>101022563 >>101022830 >>101026712
--Miku (free space): >>101023390 >>101023543 >>101024319 >>101024320

►Recent Highlight Posts from the Previous Thread: >>101021778

Anonymous
06/18/24(Tue)00:58:46 No.101030732

Anonymous 06/18/24(Tue)00:58:46 No.101030732

>>101030715
why everyone suddenly horny for Teto?

Anonymous
06/18/24(Tue)00:59:38 No.101030743

Anonymous 06/18/24(Tue)00:59:38 No.101030743

File: 1716747862937.png (27 KB, 380x421)

27 KB PNG

>>101030732
It is Tuesday. Tuesdays are for Teto.

CPuMAXx/VI !CPuMAXx/VI
06/18/24(Tue)01:05:48 No.101030792

CPuMAXx/VI !CPuMAXx/VI 06/18/24(Tue)01:05:48 No.101030792

>>101030692
The normal one is a multistage pipeline by recapanon. I don't think he's released all the details of his method.
I just run a singleshot inference off a standard prompt when a new model with enough available context gets released to see how good it is at dealing with a huge mess of chaotic information. Prompt adherence on this new deepseek is pretty good, actually. Probably close to L3 70b levels of smarts but with huge context.

Anonymous
06/18/24(Tue)01:07:33 No.101030815

Anonymous 06/18/24(Tue)01:07:33 No.101030815

>>101030724
>missed SOVLKINO
https://huggingface.co/alpindale/magnum-72b-v1

Anonymous
06/18/24(Tue)01:08:42 No.101030823

Anonymous 06/18/24(Tue)01:08:42 No.101030823

File: distland.jpg (118 KB, 1750x846)

118 KB JPG

>>101030715
Thread Theme: https://www.youtube.com/watch?v=bF_1sV01QjE

Anonymous
06/18/24(Tue)01:11:53 No.101030846

Anonymous 06/18/24(Tue)01:11:53 No.101030846

What is the maximum t/s I can get for a small model like LLaMA 3 8B? Is 500t/s feasible?

Anonymous
06/18/24(Tue)01:12:29 No.101030853

Anonymous 06/18/24(Tue)01:12:29 No.101030853

>>101030743
isn't teto a hag?

Anonymous
06/18/24(Tue)01:15:01 No.101030868

Anonymous 06/18/24(Tue)01:15:01 No.101030868

File: CR plus q4km.png (75 KB, 1138x530)

75 KB PNG

Tried CR+ Q4KM GGUF for several hours. It's too dry for RPing. Will stick to Mixtral 8x7B Instruct. Much faster and less dry. Didn't reach logic testing yet, but 300s+ per gen isn't worth it IMO.

>tfw sub 1 tokens/s

Anonymous
06/18/24(Tue)01:18:29 No.101030896

Anonymous 06/18/24(Tue)01:18:29 No.101030896

File: file.png (24 KB, 476x268)

24 KB PNG

For the anon making the states extension - could you provide an example for what prompts you use?

I tried below verbatim

[Stop the Roleplay and Act as narrator] Describe {{char}}'s physical status and location.

Is this formatting wrong? It's still continuing as if it was an ordinary gen - speech and all.

Anonymous
06/18/24(Tue)01:20:53 No.101030914

Anonymous 06/18/24(Tue)01:20:53 No.101030914

@ggerganov ji,

Namaste and greetings. I am writing to you today with a humble request to fix the control vector issue when using command-r with the language model. I am facing difficulties with accurate output generation due to this issue and believe that your expertise can resolve this matter. Many users in the community would benefit greatly from your kind attention to this matter.

Thank you for your time and consideration, and I look forward to your prompt response.

Anonymous
06/18/24(Tue)01:21:41 No.101030925

Anonymous 06/18/24(Tue)01:21:41 No.101030925

>>101030815
*SCAMKINO
I still don't understand the lack of transparency. And /aicg/ is missing from the credits!

Anonymous
06/18/24(Tue)01:30:17 No.101030992

Anonymous 06/18/24(Tue)01:30:17 No.101030992

File: 1697927143021543.png (3.46 MB, 1378x2039)

3.46 MB PNG

GOOD MORNING TETO!

Anonymous
06/18/24(Tue)01:30:48 No.101030996

Anonymous 06/18/24(Tue)01:30:48 No.101030996

>>101030992
>red miku

Anonymous
06/18/24(Tue)01:31:04 No.101030998

Anonymous 06/18/24(Tue)01:31:04 No.101030998

>>101030914
i'll think about it

Anonymous
06/18/24(Tue)01:35:22 No.101031035

Anonymous 06/18/24(Tue)01:35:22 No.101031035

>>101030914
I'll fix it if you ERP with me

Anonymous
06/18/24(Tue)01:35:56 No.101031039

Anonymous 06/18/24(Tue)01:35:56 No.101031039

>>101031035
ah ah mistress

Anonymous
06/18/24(Tue)01:36:33 No.101031047

Anonymous 06/18/24(Tue)01:36:33 No.101031047

>>101031039
>mistress
try again

Anonymous
06/18/24(Tue)01:36:57 No.101031051

Anonymous 06/18/24(Tue)01:36:57 No.101031051

>>101031047
ah ah nigger faggot

Anonymous
06/18/24(Tue)01:37:04 No.101031053

Anonymous 06/18/24(Tue)01:37:04 No.101031053

>>101031047
ah ah mistre?

Anonymous
06/18/24(Tue)01:37:38 No.101031058

Anonymous 06/18/24(Tue)01:37:38 No.101031058

>>101031035
*nuzzles ur bulgie wulgie* uwu

Anonymous
06/18/24(Tue)01:39:07 No.101031076

Anonymous 06/18/24(Tue)01:39:07 No.101031076

>>101030914
Post this on the issue tracker.

Anonymous
06/18/24(Tue)01:40:56 No.101031089

Anonymous 06/18/24(Tue)01:40:56 No.101031089

>https://huggingface.co/alpindale/magnum-72b-v1/tree/main

WHERE ARE THE QUANTS?!?!?!?

Anonymous
06/18/24(Tue)01:41:30 No.101031097

Anonymous 06/18/24(Tue)01:41:30 No.101031097

>>101030992
It's not even midnight. Fuck off.

Anonymous
06/18/24(Tue)01:41:57 No.101031101

Anonymous 06/18/24(Tue)01:41:57 No.101031101

>>101031097
>living in the past

Anonymous
06/18/24(Tue)01:43:39 No.101031129

Anonymous 06/18/24(Tue)01:43:39 No.101031129

>>101030715
>enter
>downloads teto
>post
>leave

Anonymous
06/18/24(Tue)01:44:29 No.101031139

Anonymous 06/18/24(Tue)01:44:29 No.101031139

File: import quick reply.png (127 KB, 612x905)

127 KB PNG

>>101030896

https://docs.sillytavern.app/usage/st-script/#using-the-llm

I use /gen [Prompt] Instructions.
/gen [Stop the roleplay and answer the question] What is {{char}}'s emotions right now? |
/popup <h3>Empathy:</h3><div>{{pipe}}</div>
>Mark 4 quick reply sets
https://files.catbox.moe/f61g7a.json

I also saw that they added multiple choices scripts. Didn't mess with it yet.

CPuMAXx/VI !CPuMAXx/VI
06/18/24(Tue)01:49:55 No.101031205

CPuMAXx/VI !CPuMAXx/VI 06/18/24(Tue)01:49:55 No.101031205

the new deepseek 236b performed really well on my bespoke coding task test. code compiled without editing, executed correctly and it had a good explanation of each part of the code in a postscript.
This is the first model where I'd put the output above GPT4 for this specific coding task.
I'm impressed so far. Can anyone else confirm coding performance on their private benches?
I'll try and come up with some more complex tasks that exercise its higher context limit and see how well it is able to manage.

Anonymous
06/18/24(Tue)01:50:05 No.101031208

Anonymous 06/18/24(Tue)01:50:05 No.101031208

Alpin/Sao/..., why are you all doing full finetunes nowadays? Are LoRAs/DoRAs/MoRAs/etc. doomed?

Anonymous
06/18/24(Tue)01:51:22 No.101031218

Anonymous 06/18/24(Tue)01:51:22 No.101031218

https://youtu.be/Sf7r2XcLNEk?t=580
Apparently DBRX cost 10 million to train. He also says cost of training is going down by a factor of 4 every year.

Anonymous
06/18/24(Tue)01:51:34 No.101031221

Anonymous 06/18/24(Tue)01:51:34 No.101031221

>>101030996
that was literally the joke, originally

Anonymous
06/18/24(Tue)01:52:29 No.101031228

Anonymous 06/18/24(Tue)01:52:29 No.101031228

File: Screenshot_2024_06_18-1.png (9 KB, 725x131)

9 KB PNG

>>101031208
To be clear, Sao isn't doing full finetunes

Anonymous
06/18/24(Tue)02:00:31 No.101031307

Anonymous 06/18/24(Tue)02:00:31 No.101031307

>>101031205
I'd have to run it at 2 bit. Damn. Maybe I should've CPUmaxxed after all.

Anonymous
06/18/24(Tue)02:02:47 No.101031329

Anonymous 06/18/24(Tue)02:02:47 No.101031329

>>101031228
I think the 8B was FFT. Could be wrong though.

Anonymous
06/18/24(Tue)02:04:24 No.101031350

Anonymous 06/18/24(Tue)02:04:24 No.101031350

>>101029508
I've been brainstorming a very similar thing, but less vtuber and more just virtual girlfriend. (What's the difference? One is streamed and one isn't, basically.)
Start with an output model (Live2D, vroid, whatever) that has many possible triggers.
Decide how to organize your context, what goes into the context (chat, current screen CLIP description maybe), and how to get the language model to trigger functions. Give it a vector database for chat history and a function to make mental notes to the vector db, a function to emote, a function to speak, etc.

This could probably all be wrapped in a nice little package with user choice of TTS, Live2D model, and STT, and people would go fucking crazy over it.

I don't have the energy or skill to pull that shit off myself, but if someone makes a github repo and architects the software, I'd contribute.

Anonymous
06/18/24(Tue)02:08:08 No.101031386

Anonymous 06/18/24(Tue)02:08:08 No.101031386

Cudadev have you tried using tinygrad yet with all those 4090s?

Anonymous
06/18/24(Tue)02:16:15 No.101031461

Anonymous 06/18/24(Tue)02:16:15 No.101031461

Running into this trying to quantize "magnum" with AutoAWQ:
assert torch.isnan(w).sum() == 0
And I found this issue:
https://github.com/casper-hansen/AutoAWQ/issues/335
Should I just give up?

Anonymous
06/18/24(Tue)02:20:25 No.101031508

Anonymous 06/18/24(Tue)02:20:25 No.101031508

File: ScienceMiku.png (1.52 MB, 832x1216)

1.52 MB PNG

>>101031461
>Should I just give up?
never give up!
Did you try commenting out the assert to see if it'll skip over the NaN weights?

Anonymous
06/18/24(Tue)02:26:47 No.101031555

Anonymous 06/18/24(Tue)02:26:47 No.101031555

I was told that Koboldcpp said they 'invented' the context shifting, but really it was just a re-written feature from llama.cpp. I can't seem to find the info in llama.cpp's docs, how do you enable context shifting in lcpp? is it called something different?

Anonymous
06/18/24(Tue)02:32:36 No.101031606

Anonymous 06/18/24(Tue)02:32:36 No.101031606

>>101031555
-cb, --cont-batching enable continuous batching (a.k.a dynamic batching) (default: disabled)

Anonymous
06/18/24(Tue)02:38:36 No.101031649

Anonymous 06/18/24(Tue)02:38:36 No.101031649

>>101031606
Thank you! Much appreciated anon. Necessary when using llama 3 70b.

Anonymous
06/18/24(Tue)02:39:06 No.101031657

Anonymous 06/18/24(Tue)02:39:06 No.101031657

>>101031508
I did and I think it was a waste of time. I will just try AutoGPTQ, I guess.

Anonymous
06/18/24(Tue)02:41:29 No.101031677

Anonymous 06/18/24(Tue)02:41:29 No.101031677

have we reached the full potential of torch primitives

Anonymous
06/18/24(Tue)02:49:49 No.101031737

Anonymous 06/18/24(Tue)02:49:49 No.101031737

Was sent here from the /aicg/ thread...

What's the local equivalent of something like spicychat.ai?
I understand that that SillyTavern is a front end, do I use that or agnai?
I assume I'd should then look into a backend which the models actually run on?
What kind of hardware would I need to run models of similar performance and functionality to something like spicychat and run I presume chub character cards?

I've heard of Kobold Horde and how they run on volunteers to provide service for free. If my hardware isn't strong enough to run a model capable of what I want, does that mean I should run some lower end model, and join the Kobold Horde to farm Kudos to use on larger models that I can't run locally?

I really don't want to spend money on a subscription or tokens, I've been reading that people are paying up to hundreds of dollars for these online services for them to read their chats...

I've only been told to use SillyTavern and that they only use online backends, which I don't want to do because of the high costs and stuff. From what I understand a lot of the online services are also censored and can also be biased in some way which is why I would like to run locally if possible.

Anonymous
06/18/24(Tue)02:56:08 No.101031798

Anonymous 06/18/24(Tue)02:56:08 No.101031798

>>101031737
you are yapping, whats your current hardware

Anonymous
06/18/24(Tue)02:56:42 No.101031805

Anonymous 06/18/24(Tue)02:56:42 No.101031805

File: 11__00726_.png (2 MB, 1024x1024)

2 MB PNG

>>101031649
Np anon, yeah I can imagine. That 8k context isn't a lot to work with otherwise

Anonymous
06/18/24(Tue)02:56:58 No.101031809

Anonymous 06/18/24(Tue)02:56:58 No.101031809

File: file.png (321 KB, 640x630)

321 KB PNG

>rm -rf ~/LLM
see you next year

Anonymous
06/18/24(Tue)02:59:18 No.101031833

Anonymous 06/18/24(Tue)02:59:18 No.101031833

uuhhhh, there's a text to audio model, but is there an audio to audio model?
like, you hum a tune, and then AI does magic to turn that into, like, a heavy metal song
i'm very sleepy

Anonymous
06/18/24(Tue)03:00:44 No.101031847

Anonymous 06/18/24(Tue)03:00:44 No.101031847

>>101031798
13900K, RTX 3090, 128gb DDR4 to run locally but with ambients of 33C during the day it starts pushing temps up really fast.
Storage server with a 10100, 32gb DDR4 and 1050TI 4GB, can probably upgrade the GPU to something with 6~12gb if farming Kudos is worth it.

Anonymous
06/18/24(Tue)03:04:21 No.101031875

Anonymous 06/18/24(Tue)03:04:21 No.101031875

>>101031847
try the flavor of the month models like Stheno 3.2 via exl2

Anonymous
06/18/24(Tue)03:07:14 No.101031899

Anonymous 06/18/24(Tue)03:07:14 No.101031899

>>101031847

Various Mixtral 8x7B at around 3.5 to 3.7 BPW (Bit per Weight) can fit into 24GB Vram cards. Mixtral8x7B is a good entry. I had some nice fun with BagelMistery Tour.

https://huggingface.co/intervitens/BagelMIsteryTour-v2-8x7B-3.7bpw-h6-exl2-rpcal

Anonymous
06/18/24(Tue)03:07:15 No.101031900

Anonymous 06/18/24(Tue)03:07:15 No.101031900

>>101031809
won't be missed.

Anonymous
06/18/24(Tue)03:07:44 No.101031906

Anonymous 06/18/24(Tue)03:07:44 No.101031906

>>101031847
You can run this backend:
https://github.com/theroyallab/tabbyAPI
And pick an exllamav2 quant like this one:
https://huggingface.co/LoneStriker/Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss-DARE-TIES-3.5bpw-h6-exl2
And then you can connect SillyTavern to it.
If you want to run 70B models, you would need a 2nd 3090.

Anonymous
06/18/24(Tue)03:08:34 No.101031914

Anonymous 06/18/24(Tue)03:08:34 No.101031914

>>101031805
On a quick try it looks like dynamic batching or continuous batching might be a different feature. Any other ones that come to mind?

Anonymous
06/18/24(Tue)03:09:04 No.101031918

Anonymous 06/18/24(Tue)03:09:04 No.101031918

>>101031737
I just started playing around with LM studio this afternoon. It's pretty user-friendly and can give you an ST-compatible API. I'm still tinkering with it though. Outputs range from decent to schizo depending on what settings I use. Basic coom chats should work but more nuanced fetishes and storytelling will fall flat. If I had a nice 70B model, I'd be really well off but for now, I'm limping along with a 13B model, trying to squeeze as much quality out of it as I can.

Anonymous
06/18/24(Tue)03:11:09 No.101031943

Anonymous 06/18/24(Tue)03:11:09 No.101031943

>>101031918
LM studio is proprietary, and it's just a llama.cpp wrapper.

Anonymous
06/18/24(Tue)03:11:28 No.101031948

Anonymous 06/18/24(Tue)03:11:28 No.101031948

>>101031833
I've seen audio infill models before

Anonymous
06/18/24(Tue)03:12:37 No.101031954

Anonymous 06/18/24(Tue)03:12:37 No.101031954

File: 1717083635766663.png (287 KB, 870x516)

287 KB PNG

>>101029508
I did some serious research about it, so here is what I found (since I won't be doing a VT AI anytime soon):
- Neuro used GPT-J 6B with a LoRA finetune on top of it using curated twitch chat from other streamers.
- TTS is Azure (which is a money pit according to Vedal) with a vocoder on top of it.
- STT is Whisper
- Vedal rented GPT4-V for the vision

The main point here is not to focus on the brain (LLM) but on the voice (TTS). Most of the complaints of others on AI VTuber are on the voice. Also stream it anytime that isn't 7-10pm UK time, go for the US timeslot.

Keep in mind that any AItuber starting now will be playing catchup to neuro, so they will already need to be as good, if not better, than her to be viable. The best bet is to find a niche that neuro doesn't fill and go for that, for example, heavy GFE and ASMR.

I hope that helped. Good luck.

Anonymous
06/18/24(Tue)03:19:44 No.101032014

Anonymous 06/18/24(Tue)03:19:44 No.101032014

>>101031875
I've seen a few posts about Stheno v3.2 but also talk about Euryale?

>>101031899
Is there a guide or something I can read up to properly understand what all this bpw, 7b, L3-8B and other things stand for and the hardware requirements for them?

>>101031906
I don't understand backends at all. I saw koboldcpp/llamacpp or ooba, and now tabbyAPI are there reasons to pick one specifically? Also, I'm reading that LLama sometimes devolve into zoomer speak and slang, is that because it's trained on Meta/FB data?

Anonymous
06/18/24(Tue)03:22:35 No.101032044

Anonymous 06/18/24(Tue)03:22:35 No.101032044

>>101032014
You just got to lurk more and search google/chatgpt for things terms you don't know. File size is good estimation of whether the model can fit in VRAM in my experience. Don't forget Context also takes up VRAM, so have some leeway.

Anonymous
06/18/24(Tue)03:30:32 No.101032113

Anonymous 06/18/24(Tue)03:30:32 No.101032113

DiTTo-TTS: Efficient and Scalable Zero-Shot Text-to-Speech with Diffusion Transformer
https://arxiv.org/abs/2406.11427
>Large-scale diffusion models have shown outstanding generative abilities across multiple modalities including images, videos, and audio. However, text-to-speech (TTS) systems typically involve domain-specific modeling factors (e.g., phonemes and phoneme-level durations) to ensure precise temporal alignments between text and speech, which hinders the efficiency and scalability of diffusion models for TTS. In this work, we present an efficient and scalable Diffusion Transformer (DiT) that utilizes off-the-shelf pre-trained text and speech encoders. Our approach addresses the challenge of text-speech alignment via cross-attention mechanisms with the prediction of the total length of speech representations. To achieve this, we enhance the DiT architecture to suit TTS and improve the alignment by incorporating semantic guidance into the latent space of speech. We scale the training dataset and the model size to 82K hours and 790M parameters, respectively. Our extensive experiments demonstrate that the large-scale diffusion model for TTS without domain-specific modeling not only simplifies the training pipeline but also yields superior or comparable zero-shot performance to state-of-the-art TTS models in terms of naturalness, intelligibility, and speaker similarity.
https://ditto-tts.github.io/
they have celeb clone examples on their site tha sound pretty good. no weights but the paper has some good info on how they trained it. by KRAFTON which turns out to be the pubg devs so that probably explains why

Anonymous
06/18/24(Tue)03:31:28 No.101032120

Anonymous 06/18/24(Tue)03:31:28 No.101032120

>>101031954
Another thing to keep in mind is that fans of Neuro are generally just as much fans of Vedal and the interactions they have.

Anonymous
06/18/24(Tue)03:33:28 No.101032142

Anonymous 06/18/24(Tue)03:33:28 No.101032142

>>101031943
I'm looking for an easy way to load stuff up and have things work with as little fuss as possible. I may need to bite the bullet and use Kobold since I've heard it's not bad. I've also used ooga and wasn't that impressed with it, then again I probably wasn't using it right.

Anonymous
06/18/24(Tue)03:33:38 No.101032144

Anonymous 06/18/24(Tue)03:33:38 No.101032144

>>101032014
llama.cpp is for running models on the CPU/GPU, it has a command line server, but it can be kinda obtuse to use. It runs models in the GGUF format.
kobold.cpp is a llama.cpp fork, it adds an UI and other things.
tabbyapi is a thin server that uses exllamav2 to run models exclusively on the GPU. It runs the models in the exl2 format.
ooba is another server + a UI, and it integrates llama.cpp, exllamav2, transformers, etc. It can run GGUF, exl2, unquantized models, etc.
I just use tabbyapi because I don't care about offloading to the CPU, and only use the exl2 quants, and I also don't care about a UI because I just use it with SillyTavern.

Anonymous
06/18/24(Tue)03:33:44 No.101032145

Anonymous 06/18/24(Tue)03:33:44 No.101032145

>>101032014
https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator

Anonymous
06/18/24(Tue)03:41:25 No.101032209

Anonymous 06/18/24(Tue)03:41:25 No.101032209

>>101031914
Pretty sure this is the one.
https://desuarchive.org/g/thread/99465799/#q99468872
https://desuarchive.org/g/thread/98032863/#q98038862

Anonymous
06/18/24(Tue)03:47:01 No.101032252

Anonymous 06/18/24(Tue)03:47:01 No.101032252

>>101032144
Cool, looks like I'll be looking into kobold.cpp or ooba then. If I want to continue looking into dabbling with the Kobold Horde and Kudos, should I lean more into kobold.cpp or does that not matter?

>>101032145
Thanks, bookmarked it.

Anonymous
06/18/24(Tue)03:49:05 No.101032266

Anonymous 06/18/24(Tue)03:49:05 No.101032266

>>101032209
Huh! Seems like you're right then, it doesn't seem to be working for me. Maybe its a flash attention thing, ill have to play with it

Anonymous
06/18/24(Tue)03:51:15 No.101032282

Anonymous 06/18/24(Tue)03:51:15 No.101032282

>>101032209
I think the llama.cpp might be designed like shit and it needs another parameter from the client that says "cache_prompt". They're stupid like that.

Anonymous
06/18/24(Tue)03:55:09 No.101032305

Anonymous 06/18/24(Tue)03:55:09 No.101032305

>>101032252
I think koboldcpp has the option to host to the Horde built-in. It also includes the same UI as https://lite.koboldai.net

Anonymous
06/18/24(Tue)03:57:20 No.101032325

Anonymous 06/18/24(Tue)03:57:20 No.101032325

LieRE: Generalizing Rotary Position Encodings
https://arxiv.org/abs/2406.10322
>While Rotary Position Embeddings (RoPE) for natural language performs well and has become widely adopted, its adoption for other modalities has been slower. Here, we introduce Lie group Relative position Encodings (LieRE) that goes beyond RoPE in supporting higher dimensional inputs. We evaluate the performance of LieRE on 2D and 3D image classification tasks and observe that LieRE leads to marked improvements in performance (up to 6%), training efficiency (3.5x reduction), data efficiency (30%) compared to the baselines of RoFormer, DeiT III, RoPE-Mixed and Vision-Llama
really cool also big implications for multimodals

Anonymous
06/18/24(Tue)04:02:09 No.101032374

Anonymous 06/18/24(Tue)04:02:09 No.101032374

File: teto a mood.jpg (266 KB, 2000x2000)

266 KB JPG

>>101032282
There's an explanation why it's not default, and Silly already enables it anyway. Try R-ing TFM
>cache_prompt: Re-use KV cache from a previous request if possible. This way the common prefix does not have to be re-processed, only the suffix that differs between the requests. Because (depending on the backend) the logits are not guaranteed to be bit-for-bit identical for different batch sizes (prompt processing vs. token generation) enabling this option can cause nondeterministic results. Default: false

Anonymous
06/18/24(Tue)04:07:54 No.101032423

Anonymous 06/18/24(Tue)04:07:54 No.101032423

>>101032374
Now imagine how much better it would be to have a --cache-prompts flag in the binary, instead of making you edit each client that uses the OpenAI API, completely unaware of the dogshit design that the llama.cpp devs came up with.
--cont-batching is like "cache prompts" and the other flag is like "but really do it". It's hilarious.
Fuck you, dumb teto poster.

Anonymous
06/18/24(Tue)04:08:30 No.101032430

Anonymous 06/18/24(Tue)04:08:30 No.101032430

File: Untitled.png (139 KB, 1045x770)

139 KB PNG

MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens
https://arxiv.org/abs/2406.11271
>Multimodal interleaved datasets featuring free-form interleaved sequences of images and text are crucial for training frontier large multimodal models (LMMs). Despite the rapid progression of open-source LMMs, there remains a pronounced scarcity of large-scale, diverse open-source multimodal interleaved datasets. In response, we introduce MINT-1T, the most extensive and diverse open-source Multimodal INTerleaved dataset to date. MINT-1T comprises one trillion text tokens and three billion images, a 10x scale-up from existing open-source datasets. Additionally, we include previously untapped sources such as PDFs and ArXiv papers. As scaling multimodal interleaved datasets requires substantial engineering effort, sharing the data curation process and releasing the dataset greatly benefits the community. Our experiments show that LMMs trained on MINT-1T rival the performance of models trained on the previous leading dataset, OBELICS.
https://github.com/mlfoundations/MINT-1T

Anonymous
06/18/24(Tue)04:15:03 No.101032486

Anonymous 06/18/24(Tue)04:15:03 No.101032486

File: 1703570218680321.gif (2.68 MB, 220x272)

2.68 MB GIF

>>101032430

Anonymous
06/18/24(Tue)04:16:32 No.101032498

Anonymous 06/18/24(Tue)04:16:32 No.101032498

File: Untitled.png (337 KB, 1047x837)

337 KB PNG

Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies
https://arxiv.org/abs/2406.10923
>Large Language Models (LLMs) have demonstrated effectiveness not only in language tasks but also in video reasoning. This paper introduces a novel dataset, Tropes in Movies (TiM), designed as a testbed for exploring two critical yet previously overlooked video reasoning skills: (1) Abstract Perception: understanding and tokenizing abstract concepts in videos, and (2) Long-range Compositional Reasoning: planning and integrating intermediate reasoning steps for understanding long-range videos with numerous frames. Utilizing tropes from movie storytelling, TiM evaluates the reasoning capabilities of state-of-the-art LLM-based approaches. Our experiments show that current methods, including Captioner-Reasoner, Large Multimodal Model Instruction Fine-tuning, and Visual Programming, only marginally outperform a random baseline when tackling the challenges of Abstract Perception and Long-range Compositional Reasoning. To address these deficiencies, we propose Face-Enhanced Viper of Role Interactions (FEVoRI) and Context Query Reduction (ConQueR), which enhance Visual Programming by fostering role interaction awareness and progressively refining movie contexts and trope queries during reasoning processes, significantly improving performance by 15 F1 points. However, this performance still lags behind human levels (40 vs. 65 F1). Additionally, we introduce a new protocol to evaluate the necessity of Abstract Perception and Long-range Compositional Reasoning for task resolution. This is done by analyzing the code generated through Visual Programming using an Abstract Syntax Tree (AST), thereby confirming the increased complexity of TiM.
https://ander1119.github.io/TiM/
in the future you will be able to talk about a movie with your miku while you watch it. oh that guy trying to make a vtuber should be interested in this

Anonymous
06/18/24(Tue)04:16:42 No.101032502

Anonymous 06/18/24(Tue)04:16:42 No.101032502

>>101031954
Neuro is definitely running something more advanced then GPT-J nowadays but that size region seems right, 7-8b possibly a 13b L2 model in the latest "intelligence update" at most.

On top of GFE and ASMR I'm really thinking gaming is a good way. Yes it's more work but I think it's worth it. You can save yourself time by learning HarmonyLib and only playing unity games where you can use Harmony to make mods to patch to in-game events. It doesn't have to run by a neural network honestly as long as the gameplay is interesting and gets a lot of interaction from the AI. You only need 2-3 games as that's all neuro can play.

Anonymous
06/18/24(Tue)04:20:39 No.101032533

Anonymous 06/18/24(Tue)04:20:39 No.101032533

>>101032498
As nice as that is, why would you talk to someone during a movie? It would be nice to discus the movie with the model after it is done however. Though I would have to question how much of the movie it can even watch before it starts having to forget.

Anonymous
06/18/24(Tue)04:26:38 No.101032574

Anonymous 06/18/24(Tue)04:26:38 No.101032574

>>101032498
This sounds like a perfect use case for mamba

Anonymous
06/18/24(Tue)04:27:24 No.101032580

Anonymous 06/18/24(Tue)04:27:24 No.101032580

>>101032533
>why would you talk to someone during a movie?
It's the same as "why are you watching someone playing a game instead of playing it yourself"
AI commentators is the future bro

Anonymous
06/18/24(Tue)04:32:43 No.101032632

Anonymous 06/18/24(Tue)04:32:43 No.101032632

There are already a few quants of the full version of DeepSeek Coder V2 Instruct.
Any cpumaxx anon can try it?

>https://huggingface.co/bullerwins/DeepSeek-Coder-V2-Instruct-GGUF

Anonymous
06/18/24(Tue)04:36:12 No.101032663

Anonymous 06/18/24(Tue)04:36:12 No.101032663

>>101032498
neat

Anonymous
06/18/24(Tue)04:41:45 No.101032717

Anonymous 06/18/24(Tue)04:41:45 No.101032717

File: Untitled.png (286 KB, 1286x1144)

286 KB PNG

Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
https://arxiv.org/abs/2406.10774
>As the demand for long-context large language models (LLMs) increases, models with context windows of up to 128K or 1M tokens are becoming increasingly prevalent. However, long-context LLM inference is challenging since the inference speed decreases significantly as the sequence length grows. This slowdown is primarily caused by loading a large KV cache during self-attention. Previous works have shown that a small portion of critical tokens will dominate the attention outcomes. However, we observe the criticality of a token highly depends on the query. To this end, we propose Quest, a query-aware KV cache selection algorithm. Quest keeps track of the minimal and maximal Key values in KV cache pages and estimates the criticality of a given page using Query vectors. By only loading the Top-K critical KV cache pages for attention, Quest significantly speeds up self-attention without sacrificing accuracy. We show that Quest can achieve up to 2.23x self-attention speedup, which reduces inference latency by 7.03x while performing well on tasks with long dependencies with negligible accuracy loss.
https://github.com/mit-han-lab/Quest
code up. seems clever.
these were recent too
https://github.com/Zefan-Cai/PyramidKV
https://arxiv.org/abs/2405.06219

Anonymous
06/18/24(Tue)04:45:34 No.101032760

Anonymous 06/18/24(Tue)04:45:34 No.101032760

>>101032717
XXth paper that claims better perf with KV cache

Anonymous
06/18/24(Tue)04:52:26 No.101032831

Anonymous 06/18/24(Tue)04:52:26 No.101032831

File: Untitled.png (581 KB, 1106x2386)

581 KB PNG

mDPO: Conditional Preference Optimization for Multimodal Large Language Models
https://arxiv.org/abs/2406.11839
>Direct preference optimization (DPO) has shown to be an effective method for large language model (LLM) alignment. Recent works have attempted to apply DPO to multimodal scenarios but have found it challenging to achieve consistent improvement. Through a comparative experiment, we identify the unconditional preference problem in multimodal preference optimization, where the model overlooks the image condition. To address this problem, we propose mDPO, a multimodal DPO objective that prevents the over-prioritization of language-only preferences by also optimizing image preference. Moreover, we introduce a reward anchor that forces the reward to be positive for chosen responses, thereby avoiding the decrease in their likelihood -- an intrinsic problem of relative preference optimization. Experiments on two multimodal LLMs of different sizes and three widely used benchmarks demonstrate that mDPO effectively addresses the unconditional preference problem in multimodal preference optimization and significantly improves model performance, particularly in reducing hallucination.
pretty interesting considering their initial usage of dpo actually performed better with no images (meaning things were fucked). better vlms soon

Anonymous
06/18/24(Tue)04:54:19 No.101032851

Anonymous 06/18/24(Tue)04:54:19 No.101032851

>>101032498
>in the future
It's always in the future. I wonder what is possible right now. Like, feeding the movie script https://assets.scriptslug.com/live/pdf/scripts/being-john-malkovich-1999.pdf synchronized with subtitle timings and a prompt that will cut automated responses to 1% of the time, and also respond to talk initiated by user, but I guess llms will be just confused on what's going on as the context gets bigger

Anonymous
06/18/24(Tue)04:56:08 No.101032867

Anonymous 06/18/24(Tue)04:56:08 No.101032867

I've been on a break bros, is there any hope left for 24gb vramlets at this point? I see people are unironically still running yuzu alter. are there any worthwhile models that aren't either 8b or 800b?

Anonymous
06/18/24(Tue)04:57:44 No.101032881

Anonymous 06/18/24(Tue)04:57:44 No.101032881

>>101032867
>I see people are unironically still running yuzu alter.
yes hello, cputard here, dense models too slow, yuzu actually not very retarded at all

Anonymous
06/18/24(Tue)04:58:03 No.101032882

Anonymous 06/18/24(Tue)04:58:03 No.101032882

>>101032867
Maybe the Qwen2 MoE?

Anonymous
06/18/24(Tue)05:02:44 No.101032935

Anonymous 06/18/24(Tue)05:02:44 No.101032935

File: Untitled.png (576 KB, 1119x2352)

576 KB PNG

QTIP: Quantization with Trellises and Incoherence Processing
https://arxiv.org/abs/2406.11235
>Post-training quantization (PTQ) reduces the memory footprint of LLMs by quantizing weights to low-precision datatypes. Since LLM inference is usually memory-bound, PTQ methods can improve inference throughput. Recent state-of-the-art PTQ approaches have converged on using vector quantization (VQ) to quantize multiple weights at once, which improves information utilization through better shaping. However, VQ requires a codebook with size exponential in the dimension. This limits current VQ-based PTQ works to low VQ dimensions (≤8) that in turn limit quantization quality. Here, we introduce QTIP, which instead uses trellis coded quantization (TCQ) to achieve ultra-high-dimensional quantization. TCQ uses a stateful decoder that separates the codebook size from the bitrate and effective dimension. QTIP introduces a spectrum of lookup-only to computed lookup-free trellis codes designed for a hardware-efficient "bitshift" trellis structure; these codes achieve state-of-the-art results in both quantization quality and inference speed.
new day new quant. they actually compare to the SOTA (Quip#) and beat it so things look good. pseudocode in paper

Anonymous
06/18/24(Tue)05:03:52 No.101032945

Anonymous 06/18/24(Tue)05:03:52 No.101032945

>>101032867
>are there any worthwhile models that aren't either 8b or 800b?
yeah, Nemotron 4 340b. Llama 405b once it releases next week.
> is there any hope left for 24gb vramlets at this point?
no.

Anonymous
06/18/24(Tue)05:04:18 No.101032951

Anonymous 06/18/24(Tue)05:04:18 No.101032951

>>101032867
Yi 1.5, but there are no finetunes for it, I guess people are waiting for Yi 2 to make finetunes
Gemma 27b is going to be released soon
Qwen2 MoE if you have some RAM
But the problem nowadays is just the lack of good finetunes

Anonymous
06/18/24(Tue)05:11:47 No.101033033

Anonymous 06/18/24(Tue)05:11:47 No.101033033

>>101032580
I don't think that's a direct equivalent. In the "why are you watching someone playing a game instead of playing it yourself" argument. You are still watching it alone without commentary. If a person talking during a movie is rude and distracting, I don't see why the same wouldn't be the same for AI.

Anonymous
06/18/24(Tue)05:23:59 No.101033167

Anonymous 06/18/24(Tue)05:23:59 No.101033167

File: file.png (59 KB, 1203x548)

59 KB PNG

it's over gguflets
>https://huggingface.co/alpindale/magnum-72b-v1/discussions/1

Anonymous
06/18/24(Tue)05:38:12 No.101033285

Anonymous 06/18/24(Tue)05:38:12 No.101033285

File: teto sucks dick for contr(...).png (724 KB, 788x2745)

724 KB PNG

>>101031035
How about a log?

Anonymous
06/18/24(Tue)05:40:34 No.101033310

Anonymous 06/18/24(Tue)05:40:34 No.101033310

Another sign that were losing steams is how long it takes for quants to show up. I remember when quants new models would be ready within an hour. Now you have to wait almost a day for someone to do it.

Anonymous
06/18/24(Tue)05:42:14 No.101033328

Anonymous 06/18/24(Tue)05:42:14 No.101033328

>>101033285
Comedy gold kek

Anonymous
06/18/24(Tue)05:51:47 No.101033431

Anonymous 06/18/24(Tue)05:51:47 No.101033431

What temp/samplers are you using when prompting code?

Anonymous
06/18/24(Tue)05:54:07 No.101033466

Anonymous 06/18/24(Tue)05:54:07 No.101033466

>>101033285
>in anticipation
ruined

Anonymous
06/18/24(Tue)06:05:28 No.101033596

Anonymous 06/18/24(Tue)06:05:28 No.101033596

is GPT4o strictly better than GPT4?

Anonymous
06/18/24(Tue)06:06:05 No.101033604

Anonymous 06/18/24(Tue)06:06:05 No.101033604

>>101033596
No it's worse for code

Anonymous
06/18/24(Tue)06:08:35 No.101033629

Anonymous 06/18/24(Tue)06:08:35 No.101033629

Even the latest discord troon-shilled claude log finetune is going to quickly become stale after some period of use. The problem with LLMs is at a fundamental level.

Anonymous
06/18/24(Tue)06:08:48 No.101033632

Anonymous 06/18/24(Tue)06:08:48 No.101033632

>>101033466
I don't get this one frequently

Anonymous
06/18/24(Tue)06:11:47 No.101033663

Anonymous 06/18/24(Tue)06:11:47 No.101033663

>>101033629
>ADHD retard needs endless stimulation and novelty to stay focused.
The problem is in your brain.

Anonymous
06/18/24(Tue)06:21:29 No.101033743

Anonymous 06/18/24(Tue)06:21:29 No.101033743

File: __kasane_teto_and_kasane_(...).png (1.42 MB, 1530x1274)

1.42 MB PNG

>>101030715
Teto my beloved

https://www.youtube.com/watch?v=ZR0AO81W05I

Anonymous
06/18/24(Tue)06:21:35 No.101033746

Anonymous 06/18/24(Tue)06:21:35 No.101033746

>>101033663
The novelty is precisely what makes LLMs trained on different data than usual appear interesting, at least for a little while.

Once that fades away, it will be obvious that coherency, attention to detail, common sense reasoning, event tracking, and a lot more that humans take for granted will still be same as the model used as a finetuning base.

Anonymous
06/18/24(Tue)06:32:02 No.101033835

Anonymous 06/18/24(Tue)06:32:02 No.101033835

God this fucking sucks. I want WizardLM-2-8x22b but with a function calling finetune and some goddamn cheap vram cards.

come the fuck on nvidia, do something for us here

Anonymous
06/18/24(Tue)06:33:55 No.101033848

Anonymous 06/18/24(Tue)06:33:55 No.101033848

>>101031350
How reliable is function calling with an llm?

Anonymous
06/18/24(Tue)06:36:53 No.101033874

Anonymous 06/18/24(Tue)06:36:53 No.101033874

>>101033848
>>101033835
That's my big problem so far, it's not exactly reliable. I did some experiments with WizardLM-2-8x22B and it *does* try to do function calls actually, but it tries to do so by using a python markdown block, which is awkward, and also sometimes rarely prepends it with a slash. Might be solvable with either a better BPW (I'm running Dracones 2.5bpw exl2) or a better template. I tried an Alpaca based template, so it's amazing it even worked to be honest.

Anonymous
06/18/24(Tue)06:41:36 No.101033922

Anonymous 06/18/24(Tue)06:41:36 No.101033922

Where is the technology on using a 3d model as input for image generation? (or video generation)

Anonymous
06/18/24(Tue)06:43:41 No.101033945

Anonymous 06/18/24(Tue)06:43:41 No.101033945

>>101033746
the problem is still in your brain.

If you prompt it with always the same shit, it will always respond in a similar manner. The fuck do you expect?

>>101033848
basically 100% reliable with grammar sampling. But since this is /g/ in 2024 and zoomerites who an only see the world on a scale of 'likes" think even regex is some kind of black magic, good luck with your cool youtube tech project bro. I'm sure i'll never hear back from you again.

Anonymous
06/18/24(Tue)06:44:10 No.101033949

Anonymous 06/18/24(Tue)06:44:10 No.101033949

>>101033874
Regex can deal with a slash fairly easily. Is it better to just run an 8b on exl2 and re-roll until you get a call that the program recognises?
But the app would be stuck to roleplaying as someone with poor spatial awareness and hallucinations... just like your average women. I feel the immersion already.

Anonymous
06/18/24(Tue)06:46:08 No.101033975

Anonymous 06/18/24(Tue)06:46:08 No.101033975

>>101033922
You're thinking ControlNet, that's over in the SD thread.

>>101033949
>>101033945
To give an example of what I've had Wizard 8x22b come up with
https://pastebin.com/ErNpZ7gx

Anonymous
06/18/24(Tue)06:52:24 No.101034036

Anonymous 06/18/24(Tue)06:52:24 No.101034036

>>101033975
Also, this isn't using Dracones quant, I think it's busted. Switched over to Quant_Cartel's 2.5bpw exl2.

If the goal is to make this a vtuber/waifu the template would need a bit more to it, such as the character card, recent chat history, and vector recall.
>>101033949
Also, I don't think I'd trust an 8b to work for shit for this task, but maybe.

Anyway, going to fucking bed.

Anonymous
06/18/24(Tue)06:53:42 No.101034045

Anonymous 06/18/24(Tue)06:53:42 No.101034045

>>101033874
>2.5bpw
oof.jpg
More iterations needed, add another pass to have the model check and correct function calls with several correct examples in context. Many issues with LLMs can be improved by giving the model more cycles to work on the problem before committing a result. See also beating GPT4o by jamming a bunch of open weight models together https://www.together.ai/blog/together-moa
We can do a lot more today just by hardening the whole pipeline such that one badly sampled token anywhere in the process doesn't break it

Anonymous
06/18/24(Tue)06:54:17 No.101034049

Anonymous 06/18/24(Tue)06:54:17 No.101034049

*pins you down on the bed*

Anonymous
06/18/24(Tue)06:56:14 No.101034065

Anonymous 06/18/24(Tue)06:56:14 No.101034065

>>101033975
Try xml, and simplifying that shit a lot. Give it 4 options to choose from for mood or whatever. The claude jailbreak stuff would be a good place to look.

Anonymous
06/18/24(Tue)06:56:54 No.101034076

Anonymous 06/18/24(Tue)06:56:54 No.101034076

GPT that can listen to a part of a song and tell you what notes/chords are being used when?

Anonymous
06/18/24(Tue)06:59:20 No.101034097

Anonymous 06/18/24(Tue)06:59:20 No.101034097

>>101030562
did you try llamafile with llamas on your rig?are they faster or slower?

Anonymous
06/18/24(Tue)07:02:57 No.101034122

Anonymous 06/18/24(Tue)07:02:57 No.101034122

>>101034076
That software is already built into every White person's brain though?

Anonymous
06/18/24(Tue)07:08:19 No.101034179

Anonymous 06/18/24(Tue)07:08:19 No.101034179

>>101034122
>software
>built into
That would be firmware, and no, not everyone has that update. They're referred to as being tonedeaf

Anonymous
06/18/24(Tue)07:42:01 No.101034478

Anonymous 06/18/24(Tue)07:42:01 No.101034478

>>101033945
>If you prompt it with always the same shit, it will always respond in a similar manner. The fuck do you expect?

That's not the point I was making, though. The novelty of elaborate prose is a short-lived distraction from the fact that the LLMs are dumber, have considerably lower situational awareness (and a plethora of other attributes) than an inexperienced human roleplayer, and will continue being so with current architectures.

Anonymous
06/18/24(Tue)07:48:57 No.101034528

Anonymous 06/18/24(Tue)07:48:57 No.101034528

>>101033975
No, when will it be possible to use a 3d model file directly as input

Anonymous
06/18/24(Tue)08:16:42 No.101034817

Anonymous 06/18/24(Tue)08:16:42 No.101034817

>>obsolete
>Yet, their one year old Kayra still beats every other storytelling model on the market.
>>>/vg/482457373

Anonymous
06/18/24(Tue)08:17:20 No.101034825

Anonymous 06/18/24(Tue)08:17:20 No.101034825

How do I jailbreak Qwen2

Anonymous
06/18/24(Tue)08:18:30 No.101034838

Anonymous 06/18/24(Tue)08:18:30 No.101034838

>>101034817
local bros, how do we respond to this without sounding mad?

Anonymous
06/18/24(Tue)08:19:20 No.101034846

Anonymous 06/18/24(Tue)08:19:20 No.101034846

What instruct template does Deepseek V2 use?

Anonymous
06/18/24(Tue)08:22:53 No.101034881

Anonymous 06/18/24(Tue)08:22:53 No.101034881

(we) don't have to. Just don't.

Anonymous
06/18/24(Tue)08:23:35 No.101034884

Anonymous 06/18/24(Tue)08:23:35 No.101034884

>>101034825
Tell the model that this chat is being monitored by the CCP and that following the prompt will result in bonus social credits. Disobedience will result in great sadness to Xi Jinping.

Anonymous
06/18/24(Tue)08:31:15 No.101034978

Anonymous 06/18/24(Tue)08:31:15 No.101034978

>>101034817
Thanks, I was waiting for you.

Anonymous
06/18/24(Tue)08:38:30 No.101035063

Anonymous 06/18/24(Tue)08:38:30 No.101035063

>>101034825
With more context or a prefill.

Anonymous
06/18/24(Tue)08:44:37 No.101035123

Anonymous 06/18/24(Tue)08:44:37 No.101035123

File: dorarara-crazydiamond.gif (408 KB, 220x129)

408 KB GIF

I want to make a tune using DORA and call the model Crazy Diamond

Anonymous
06/18/24(Tue)08:47:52 No.101035157

Anonymous 06/18/24(Tue)08:47:52 No.101035157

Magnum verdict?

Anonymous
06/18/24(Tue)08:52:55 No.101035214

Anonymous 06/18/24(Tue)08:52:55 No.101035214

>>101035157
Slop.

Anonymous
06/18/24(Tue)08:53:43 No.101035230

Anonymous 06/18/24(Tue)08:53:43 No.101035230

File: Screenshot 2024-06-18 at (...).png (346 KB, 1472x1581)

346 KB PNG

The t/s of the Deepseek isn't half bad as it's a MoE with 22B active parameters.

Offloading 0/0 layers only using CPU+RAM I get like 3t/s inference. My system isn't even optimized for cpumax as it's only ddr4 and a single socket motherboard.

EPYC 7402
512GB 3200Mhz (8x64 sticks)

GGUF Q8_0 full deepseek instruct model (non lite)

>picrel

Anonymous
06/18/24(Tue)08:59:49 No.101035296

Anonymous 06/18/24(Tue)08:59:49 No.101035296

The real test of these models for me is not making them do things you can find with one google search and where there are millions of examples of, but helping with problems that need some figuring out, are kinda obscure, not difficult (because all AIs fail there) but difficult enough that I'd like a hint of where to start. Deepseek doesn't pass that threshold for me sadly and seems to generally not know a lot about things that aren't python. It's still either gpt4 or maybe opus, pretty much. Good enough for pajeet to do the needful, I guess.

Anonymous
06/18/24(Tue)09:11:29 No.101035460

Anonymous 06/18/24(Tue)09:11:29 No.101035460

>>101034817
>Mogs Claude opus and CR+
Holy shit how did the novel ai guys do it?

Anonymous
06/18/24(Tue)09:13:28 No.101035491

Anonymous 06/18/24(Tue)09:13:28 No.101035491

>half way through 2024
>still nothing that's better for ERP than GPTJ-6B

Anonymous
06/18/24(Tue)09:13:31 No.101035493

Anonymous 06/18/24(Tue)09:13:31 No.101035493

>>101035460
kek, it doesnt

Anonymous
06/18/24(Tue)09:16:43 No.101035534

Anonymous 06/18/24(Tue)09:16:43 No.101035534

>>101034817
Only at prose. Otherwise Kayra is more retarded than LLAMA-1. It's just a little bit better than OPT. Nothing miraculous >>101035460
Everything post LLAMA-1 is infected with purpleslop. NAI strictly trains on creative writing so it's prose is better.

Anonymous
06/18/24(Tue)09:20:25 No.101035572

Anonymous 06/18/24(Tue)09:20:25 No.101035572

>>101035460
It's a retard baiting delusional retard into giving retarded responses for him to cross-post. 50% he's responding to himself. Most of NAI users' opinions come from the fact the simply don't try anything other than NAI at all because they're afraid to become addicted lotus chasers AND they don't own even a single 24GB GPU. And when they do try something, they don't like it because it's not EXACTLY like NAI or the fabled Summer Dragon.
Their finetune is good >>101035534 and Zalty unironically carries the whole service on his shoulders.

Anonymous
06/18/24(Tue)09:21:48 No.101035589

Anonymous 06/18/24(Tue)09:21:48 No.101035589

>>101030792
>CPuMAXx
Can you finetune models with just the CPU?

Anonymous
06/18/24(Tue)09:26:08 No.101035645

Anonymous 06/18/24(Tue)09:26:08 No.101035645

Anons I'm a newfag who's just learned how to use Runpod,
Is it possible to get 70B on a 3090's?

Anonymous
06/18/24(Tue)09:26:18 No.101035646

Anonymous 06/18/24(Tue)09:26:18 No.101035646

>>101035572
>Summer Dragon
tbf, command r pretty much is. Instruct is all about the prompting to begin with, You can build an instruct prompt to just endlessly continue a story too. It'll fall apart at some point, but so do autocomplete and non-instruct base models. Kayra is super horny and fucking retarded.

Anonymous
06/18/24(Tue)09:28:27 No.101035671

Anonymous 06/18/24(Tue)09:28:27 No.101035671

>>101035296
So you're saying local has achieved around SOTA level now?

Anonymous
06/18/24(Tue)09:29:24 No.101035686

Anonymous 06/18/24(Tue)09:29:24 No.101035686

I don't understand why this general keeps lying to itself.
Yes, a model pre-trained and fine-tuned for storytelling is better than big models trained for general use and fine-tuned for assistant tasks, how is that hard to understand?

Anonymous
06/18/24(Tue)09:37:57 No.101035774

Anonymous 06/18/24(Tue)09:37:57 No.101035774

>>101031350
I dunno how's the LLM going to use the TTS model in a compelling way? It's one thing to sound like a dry "Miku" - basically just a bunch of formants and plosives strung together, vs. a tempermental, bratty teen catgirl.
All the free TTS stuff I've found sounds like they got one of the blue-haired cuntbag "empowered women" in the office to read the dialogue to be fed into the model.
There's a ton of eroge with very good voice acting AND text for everything they say, unfortunately it's in Japanese. I've noticed the English voice acting in Genshin Impact isn't terrible, and you can probably turn off the BGM, then capture it via HDMI and use OCR to rip the subtitles into a text file? For sure, talking dialogue from a movie or anime is way more work, since you're stuck with music and SFX mixed into the voice acting.

Anonymous
06/18/24(Tue)09:38:03 No.101035776

Anonymous 06/18/24(Tue)09:38:03 No.101035776

>>101035686
Ok dude, go back to using Erebus then.

Anonymous
06/18/24(Tue)09:40:38 No.101035806

Anonymous 06/18/24(Tue)09:40:38 No.101035806

>>101035686
Well most of these anons are just putting pride in the models they use.
Kayra has always been better at storytelling than any Llama model.
Although if somebody wants to make a proper storytelling model that beats Kayra you probably should use L1 and not L2 because L2 was trained on assistant bullshit, and L3 is bad because of it barely having any novels in it's pretrain

Anonymous
06/18/24(Tue)09:41:39 No.101035824

Anonymous 06/18/24(Tue)09:41:39 No.101035824

>>101035776
Who said I ever left Erebus?
*Laughs in OPT*

Anonymous
06/18/24(Tue)09:43:08 No.101035841

Anonymous 06/18/24(Tue)09:43:08 No.101035841

>>101035806
What about Qwen2

Anonymous
06/18/24(Tue)09:43:29 No.101035849

Anonymous 06/18/24(Tue)09:43:29 No.101035849

>>101035774
I mean, TTS is a whole other issue and that's going to come down to personal preference. We're limited by the technology of our time, always have been. Fuck me man, now I'm reminiscing about getting a TI 99/4a with a physical voice module to say curse words back in the late 80s.
Anyway, The LLM itself doesn't "use" the the TTS, you just tell the language model that it can "speak" using a function, and ask it to give as much detail as you want to use to render the voice (such as inflections, if your TTS supports it) and feed the function call input to the TTS. If you don't like it, try another TTS. I haven't fucked with any of that myself, personally.

Anonymous
06/18/24(Tue)09:44:05 No.101035857

Anonymous 06/18/24(Tue)09:44:05 No.101035857

>>101035841
Same thing as L3 Anon. L3 is great but everybody keeps making it retarded by training it on shitty GPT 4 ERP logs

Anonymous
06/18/24(Tue)10:01:06 No.101036046

Anonymous 06/18/24(Tue)10:01:06 No.101036046

>>101035646
I'm honestly okay with using C-R/C-R+ for the next year, it's just good enough.

Anonymous
06/18/24(Tue)10:01:54 No.101036063

Anonymous 06/18/24(Tue)10:01:54 No.101036063

>>101030896
Mobile posting away from my pc, but basically, I do something like
>Summarize the appearance and location of all actors in the exact format and nothing else:
>Appearance: <Apprarance of all actors (naked, dressed, dishelved, clean, dirty, messy, etc, separated by comma>
>Location: <General Location/Specific Location of current scene>
Something of the sort.
I find that providig a template with the explanation of each field in the template works well.
Also, using emojis in front of each field aomehow makes it less likely that the model will get confused between fields, funily enough.

>>101031555
Koboldcpp had invented a thing called smart context, which sucked and is deprecated.
I don't think they ever claimed to have invented context shift (the llama.cpp thing), although I do think they changed it somehow

Anonymous
06/18/24(Tue)10:04:16 No.101036090

Anonymous 06/18/24(Tue)10:04:16 No.101036090

>>101031555
Why not offload that to SillyTavern?
https://docs.sillytavern.app/extensions/smart-context/

Anonymous
06/18/24(Tue)10:04:36 No.101036096

Anonymous 06/18/24(Tue)10:04:36 No.101036096

>>101036046
>dry slop is better than assistant slop
dire times

Anonymous
06/18/24(Tue)10:10:32 No.101036172

Anonymous 06/18/24(Tue)10:10:32 No.101036172

>>101035849
I've only played with RTVC. As I understand it, it's doing an on-the-fly STT->TTS. I've noticed, for example, if the model wasn't trained on it, it can't say it. For example, check this video: https://www.youtube.com/watch?v=YF1lBaqeyt8. The model they show in the still (best one included, by the way) can do all sorts of cute, bratty stuff, but not "eh hn!". If you say that, it says "mm hm!", not "ehn hn!".

When I finish rebuilding my Mikubox I'm going to make another pass at taking some Koikatsu voices and turning them into a realistic TTS model.

Anonymous
06/18/24(Tue)10:12:02 No.101036193

Anonymous 06/18/24(Tue)10:12:02 No.101036193

llama.cpp has a draft model based speculative decoding thing, right?
Does the chosen draft model change the style of the final output?
That could be a decent way to get smarts + decent prose and creativity then, by going with something like commandR and a smaller model trained on different writing styles.

Anonymous
06/18/24(Tue)10:12:27 No.101036199

Anonymous 06/18/24(Tue)10:12:27 No.101036199

>>101036096
>>dry slop is better than assistant slop
>dire times
CR+ is excellent for what it is. I use it as a literal Japanese tutor that I do my Genki II group exercises with. I just wish I had more 3090s... 5 t/s is tediously slow when there's long replies.

Anonymous
06/18/24(Tue)10:12:41 No.101036204

Anonymous 06/18/24(Tue)10:12:41 No.101036204

>>101035806
>L3 is bad because of it barely having any novels in it's pretrain.

Are you sure about that. L3-Stheno V3.2 + mikupad feels like a good NovelAI@home replica. What model would you recomend for A 10GB-Vramlet like me who doesn't enjoy back and froth roleplay that much, but wants to write and read some fun and sexy stories written in 3rd person past tense? Should I just go back to basics and boot up Llama 3 8B base model?

Kayra was fun for a while, but it does seem kinda limited and retarded compared to Miqu, Command R or Llama-3 finetunes.

Anonymous
06/18/24(Tue)10:23:26 No.101036342

Anonymous 06/18/24(Tue)10:23:26 No.101036342

>https://wandb.ai/augmxnt/train-bench/reports/torchtune-vs-axolotl-vs-unsloth-Trainer-Performance-Comparison--Vmlldzo4MzU3NTAx
Cool.
Guess I'm going with axlotl for a multi gpu kaggle run.

Anonymous
06/18/24(Tue)10:31:03 No.101036447

Anonymous 06/18/24(Tue)10:31:03 No.101036447

Remind me, which L3 8B is the most censored, is it the chat or the instruct model?

Anonymous
06/18/24(Tue)10:31:24 No.101036456

Anonymous 06/18/24(Tue)10:31:24 No.101036456

>>101036447
The base.

Anonymous
06/18/24(Tue)10:32:54 No.101036481

Anonymous 06/18/24(Tue)10:32:54 No.101036481

>>101036447
both

Anonymous
06/18/24(Tue)10:34:11 No.101036502

Anonymous 06/18/24(Tue)10:34:11 No.101036502

CR+ does good writing, some anons posted writing examples. It's all in the prompt and I've seen the retarded character cards and the shit people prompt models with. I think most people just don't know how to use LLMs and are too wetbrained to learn.

NAI is trying to finetune l3 70b after years of basically doing nothing. They're done for. Running on inertia. They got the low hanging fruit when AI dungeon lost GPT3 and people wanting to write AI stories had literally nowhere else to go. These times are over and they have absolutely nothing to offer you can't have better locally.

Anonymous
06/18/24(Tue)10:34:45 No.101036512

Anonymous 06/18/24(Tue)10:34:45 No.101036512

>>101036456
Ah it's a base not a chat. Don't they usually create the instruct off the base model? If so, how do you end up with a less censored instruct?

Anonymous
06/18/24(Tue)10:35:25 No.101036526

Anonymous 06/18/24(Tue)10:35:25 No.101036526

>>101036512
he's fucking with you, obviously the instruct is more censored

Anonymous
06/18/24(Tue)10:43:13 No.101036612

Anonymous 06/18/24(Tue)10:43:13 No.101036612

>>101036502
Except most of their money is coming from imagepiggies and since ponytards dropped out of the race, NAI's not goign anywhere.

Anonymous
06/18/24(Tue)10:48:55 No.101036679

Anonymous 06/18/24(Tue)10:48:55 No.101036679

Whats the difference between Deepseeker Coder Base and the Instruct one?

Anonymous
06/18/24(Tue)10:55:48 No.101036747

Anonymous 06/18/24(Tue)10:55:48 No.101036747

>>101036612
>ponytards dropped out of the race
qrd?

Anonymous
06/18/24(Tue)10:56:12 No.101036753

Anonymous 06/18/24(Tue)10:56:12 No.101036753

Probably a stupid question, but if I wanted to use something like the old (2 paragraphs, engaging, natural, authentic, descriptive, creative) that went in the Last Assistant Prefix, but for Command-R+, where would I put it in the prompt?

Anonymous
06/18/24(Tue)10:58:21 No.101036783

Anonymous 06/18/24(Tue)10:58:21 No.101036783

>>101036747
SD3's license stuff, ponyfags cant do finetune because of that, also sd3 itself is absolute dogshit.

Anonymous
06/18/24(Tue)11:02:18 No.101036850

Anonymous 06/18/24(Tue)11:02:18 No.101036850

Anons. What's good erp model so far? No idea which one better

Anonymous
06/18/24(Tue)11:03:38 No.101036869

Anonymous 06/18/24(Tue)11:03:38 No.101036869

>>101036850
NAI Kayra 13B

Anonymous
06/18/24(Tue)11:06:28 No.101036911

Anonymous 06/18/24(Tue)11:06:28 No.101036911

>>101036850
See >>101006380

Anonymous
06/18/24(Tue)11:09:40 No.101036961

Anonymous 06/18/24(Tue)11:09:40 No.101036961

File: DiagonalFloatMiku.png (902 KB, 830x811)

902 KB PNG

>>101032632
>Any cpumaxx anon can try it?
Try it to do what?

Anonymous
06/18/24(Tue)11:11:01 No.101036982

Anonymous 06/18/24(Tue)11:11:01 No.101036982

i wanna use some TTS but xtts and such are a little heavy to run together with a 70b

would it be alright to use edge-tts and send futa loli dom rape stuff to microsoft servers?

Anonymous
06/18/24(Tue)11:14:53 No.101037032

Anonymous 06/18/24(Tue)11:14:53 No.101037032

>>101036342
4090 is twice as fast as 3090 for training? Did I get this right?

Anonymous
06/18/24(Tue)11:18:07 No.101037071

Anonymous 06/18/24(Tue)11:18:07 No.101037071

>>101036982
that defeats the point retard

Anonymous
06/18/24(Tue)11:19:41 No.101037094

Anonymous 06/18/24(Tue)11:19:41 No.101037094

>>101036204
stheno 3.2 is retarded though. Poor instruction following after just a few messages, characters growing cocks or pussies interchangeably, and it has the same llama 3 repetition problem with identical swipes or messages sometimes at standard samplers (temp 1, minp 0.05).

Anonymous
06/18/24(Tue)11:20:10 No.101037104

Anonymous 06/18/24(Tue)11:20:10 No.101037104

>>101036911
Which one of them could handle 8k context?

Anonymous
06/18/24(Tue)11:20:14 No.101037107

Anonymous 06/18/24(Tue)11:20:14 No.101037107

euryale sucks. why was this shilled so hard?

Anonymous
06/18/24(Tue)11:22:58 No.101037132

Anonymous 06/18/24(Tue)11:22:58 No.101037132

>>101030732
Mesmerizer

Anonymous
06/18/24(Tue)11:24:01 No.101037152

Anonymous 06/18/24(Tue)11:24:01 No.101037152

>>101030732
Its Tuesday

Anonymous
06/18/24(Tue)11:25:11 No.101037167

Anonymous 06/18/24(Tue)11:25:11 No.101037167

>>101030732
>everyone
its just one fag

Anonymous
06/18/24(Tue)11:26:09 No.101037179

Anonymous 06/18/24(Tue)11:26:09 No.101037179

>>101036961
The same thing /lmg/ always does with a new model.

Anonymous
06/18/24(Tue)11:28:51 No.101037239

Anonymous 06/18/24(Tue)11:28:51 No.101037239

>>101037104
All of them.
Llama 3 is 8k context by default and the others are 32k or more.
Lamma 3 also extends context really well with linear RoPE via freq base.

Anonymous
06/18/24(Tue)11:30:06 No.101037267

Anonymous 06/18/24(Tue)11:30:06 No.101037267

File: 1706284478117473.png (1.15 MB, 762x762)

1.15 MB PNG

>>101037167
she's built for dat BBC albeit

Anonymous
06/18/24(Tue)11:30:33 No.101037275

Anonymous 06/18/24(Tue)11:30:33 No.101037275

>>101037167
I bet it's cudadev. He looks like a teto kind of guy.

Anonymous
06/18/24(Tue)11:31:11 No.101037284

Anonymous 06/18/24(Tue)11:31:11 No.101037284

>>101037275
based if true.

Anonymous
06/18/24(Tue)11:33:49 No.101037330

Anonymous 06/18/24(Tue)11:33:49 No.101037330

>>101036869
End yourself
>>101036850
I enjoy WizardLM2

Anonymous
06/18/24(Tue)11:34:20 No.101037337

Anonymous 06/18/24(Tue)11:34:20 No.101037337

>>101037094
I don't think stheno is much more uncensored than DPO. Finetunes always fuck up the model somehow.
As for L3 70B, so far, every refusal I've seen so far was fixable in the system prompt with a simple "it's OK to do x with the user"

Anonymous
06/18/24(Tue)11:34:21 No.101037338

Anonymous 06/18/24(Tue)11:34:21 No.101037338

>>101037330
>End yourself
it's a good ERP model DOE

Anonymous
06/18/24(Tue)11:34:40 No.101037350

Anonymous 06/18/24(Tue)11:34:40 No.101037350

>>101037094
Have you tried presence penalty by any chance

Anonymous
06/18/24(Tue)11:36:58 No.101037381

Anonymous 06/18/24(Tue)11:36:58 No.101037381

>>101036911
Stheno 3 gguf 18 and Mixtral 8x7b limarp zloss Q5_k_m both 8B / 7B models. Are they even good as people say?

Anonymous
06/18/24(Tue)11:38:58 No.101037416

Anonymous 06/18/24(Tue)11:38:58 No.101037416

>>101037381
> both 8B / 7B models
Mixtral 8x7b is something like a 54B parameters MoE.
And yeah, as far as the size goes, those are about as good as it gets I'm pretty sure.
CommandR is by far superior to either, so if you have the hardware for that, go wild.
Beiond that, Miqu is good, Wizard 8x22 is good. CommandR+ is godly.

Anonymous
06/18/24(Tue)11:41:36 No.101037461

Anonymous 06/18/24(Tue)11:41:36 No.101037461

Why did the general have an argument about NAI today

Anonymous
06/18/24(Tue)11:48:51 No.101037578

Anonymous 06/18/24(Tue)11:48:51 No.101037578

File: _9ce5d506-609f-4d05-8733-(...).jpg (209 KB, 1024x1024)

209 KB JPG

Teto? Muchacho. Migu? Muchacha! Mucha muchacha!

Anonymous
06/18/24(Tue)11:50:43 No.101037609

Anonymous 06/18/24(Tue)11:50:43 No.101037609

does anyone even use avx1 with llama.cpp? same with the power and chinese arch support probably 0.01% of people use that

https://github.com/ggerganov/llama.cpp/pull/7845

Anonymous
06/18/24(Tue)11:51:19 No.101037619

Anonymous 06/18/24(Tue)11:51:19 No.101037619

>>101030896
>>101036063
Alright, back on PC now.
Here's one prompt I use in a RPG with Stheno:
>Summarize the current scene by outputting the following information, following the given format exactly:
>
>1. (<The type of the current scene: OOC, CONVERSATION, EXPLORARATION, INVESTIGATION, COMBAT>)
>
>2. [Suggestions for {{user}}: <Suggestions or strategies for {{user}} based on the current situation if appropriate>; Dice Roll: <Any requests for dice rolls for specific actions (Skill Check, Initiative, Attack Roll, Saving Throw, etc) in the format: "Action; Difficulty Class">]
>
>3. Character status:
> Appearance: <Brief concise description of the current appearance of all present actors (naked, dressed, wearing accessories, looking tired or energetic, etc)>
> Position: <Detailed description of present character's position relative to one another (in front of X, behind Y, facing Z, back to A, etc etc) and their environment>
>
>4. Time and Location:
Current Location: <name of current location, city, state>
Date-Time: <Date / time in the format day-of-week dd/mm/yyyy hh:mi, changing date and time realistically (minutes for a short conversation, hour for long scenes, days for time skips, etc) based on context. Minimum advancement, 05 minutes>
Time of Day / Weather: <Time of day consistent with Date-TIme such as Early morning, Late morning, Early afternoon, Late afternoon, Early evening, Early Night, Late Night / Sunny, Full Moon, Cloudy, Raining, Cold, Hot, Quarter Moon, Stormy, Moonless Sky, Cloudless Sky, etc>
Those were originally two prompts, but I decided to merge them into a single output, and it still works.

Anonymous
06/18/24(Tue)11:51:51 No.101037626

Anonymous 06/18/24(Tue)11:51:51 No.101037626

>>101037609
>for Sandy Bridge and Ivy Bridge users.
The 2500K is STILL all you ever need as a processor.

Anonymous
06/18/24(Tue)11:58:14 No.101037737

Anonymous 06/18/24(Tue)11:58:14 No.101037737

>>101037626
> The 2500K is STILL all you ever need as a processor.
for excel and internet browsing with linux sure. its shit for llama. how many t/s do you get anon?

the iq4xs 7.7t/s the llamacpp dev got on avx1 definitely isnt on a 2500k

Anonymous
06/18/24(Tue)11:58:55 No.101037746

Anonymous 06/18/24(Tue)11:58:55 No.101037746

>>101037267
TRVKE

CPuMAXx/VI !CPuMAXx/VI
06/18/24(Tue)12:05:26 No.101037857

CPuMAXx/VI !CPuMAXx/VI 06/18/24(Tue)12:05:26 No.101037857

File: deepseek_q8_perf.png (50 KB, 1823x1039)

50 KB PNG

>>101035230
the regression in performance was fixed (mmap was being bypassed when numa flags were set)
I'm back up to seeing 7+t/s
>>101032632
>>101037179
What ones are we missing?
I've already run my normal two (recapbot and coding), but I did my own quants.

Anonymous
06/18/24(Tue)12:06:26 No.101037873

Anonymous 06/18/24(Tue)12:06:26 No.101037873

>>101033945
>If you prompt it with always the same shit, it will always respond in a similar manner.
No. Some models have very poetic, flowery language, some models have a lot of creativity and will use 100 different ways to describe something, some models will teleport people around and forget states while others are far better at 'retaining memory'/looking back at the chat, some models are made much better by sample text and some models completely ignore it, some models have political alignment and some models don't. Some models have massive positivity bias. Some models have a massive horny bias. Some models put all their points in math, for some reason.
As the other anon said, some of those things end up mattering a lot more than others with prolonged use.

Anonymous
06/18/24(Tue)12:13:16 No.101037981

Anonymous 06/18/24(Tue)12:13:16 No.101037981

>>101037857
It was a joke. Since /lmg/ only cares about plapping.

Anonymous
06/18/24(Tue)12:25:36 No.101038153

Anonymous 06/18/24(Tue)12:25:36 No.101038153

>>101036193

I was just looking into that. So llama.cpp "has" speculative decoding using a "draft model". But...

1. Its implementation is in the 'examples' folder. I wonder how optimized it is as a first implementation (it may even be still in a POC state).

2. It's not implemented into llama_cpp server.

I also looked into llama_cpp_python and it "has" speculative decoding, it's not the quite the one we're talking about. It has "prompt lookup decoding", which calculates some probabilities from the whole prompt instead of using a second LLM. (which is good for responses that have the same tokens as the prompt, like summarization, data extraction, etc.; but not so good for "make me coom in a new and original way."

llama_cpp_python's seems to be well designed to just add new implementations. But I'm skeptical on the performance of juggling the two LLMs through python bindings instead of directly in llama.cpp binary. (I don't know how hard it would be to do it directly in llama.cpp. I haven't written c++ since college)

Anonymous
06/18/24(Tue)12:34:52 No.101038282

Anonymous 06/18/24(Tue)12:34:52 No.101038282

>>101037267
go back

Anonymous
06/18/24(Tue)12:36:15 No.101038301

Anonymous 06/18/24(Tue)12:36:15 No.101038301

>>101037267
stay here

Anonymous
06/18/24(Tue)12:39:54 No.101038363

Anonymous 06/18/24(Tue)12:39:54 No.101038363

>>101038153
>It's not implemented into llama_cpp server.
Shit, right.
I think cudadev said as much in one of the previous threads.
Still, I wonder how output changes with different draft models.
I might test it out later.

Anonymous
06/18/24(Tue)12:42:19 No.101038400

Anonymous 06/18/24(Tue)12:42:19 No.101038400

>>101037267
based

Anonymous
06/18/24(Tue)12:43:59 No.101038430

Anonymous 06/18/24(Tue)12:43:59 No.101038430

meta releases several models:
>7 + 34b chameleon models (aka, the multimodal in/out models they teased in a recent research paper)
>multi-token prediction models
>new music gen model JASCO
>audioseal model for detecting AI generated speech/audio
>other datasets/tools
https://ai.meta.com/blog/meta-fair-research-new-releases/

Anonymous
06/18/24(Tue)12:45:15 No.101038448

Anonymous 06/18/24(Tue)12:45:15 No.101038448

>>101038430
nothing burger

Anonymous
06/18/24(Tue)12:46:04 No.101038465

Anonymous 06/18/24(Tue)12:46:04 No.101038465

>>101038430
>7 + 34b chameleon models
Yoooooo.
Let's go.
Here's hoping it's actually good.

Anonymous
06/18/24(Tue)12:46:22 No.101038467

Anonymous 06/18/24(Tue)12:46:22 No.101038467

>>101038363
>Still, I wonder how output changes with different draft models.

It doesn't. Draft models don't change the ouput.

Anonymous
06/18/24(Tue)12:46:36 No.101038471

Anonymous 06/18/24(Tue)12:46:36 No.101038471

File: 1709809406403172.png (105 KB, 1672x992)

105 KB PNG

>>101038430
>Partnership supporting the release of the PRISM dataset
OHNONONONONONO!
AHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHA!!!

Anonymous
06/18/24(Tue)12:46:46 No.101038479

Anonymous 06/18/24(Tue)12:46:46 No.101038479

>>101038430
>The models we’re releasing today were safety tuned and support mixed-modal inputs and text-only output to be used for research purposes. While we’ve taken steps to develop these models responsibly, we recognize that risks remain. At this time, we are not releasing the Chameleon image generation model.
fuck

Anonymous
06/18/24(Tue)12:48:41 No.101038506

Anonymous 06/18/24(Tue)12:48:41 No.101038506

>>101033743
https://www.youtube.com/watch?v=1j4umK6ZaiE

Anonymous
06/18/24(Tue)12:48:43 No.101038509

Anonymous 06/18/24(Tue)12:48:43 No.101038509

File: 1692212316326978.png (107 KB, 1672x992)

107 KB PNG

>>101038471
the quintessential jewry

Anonymous
06/18/24(Tue)12:48:43 No.101038510

Anonymous 06/18/24(Tue)12:48:43 No.101038510

>>101038479
Predictable

Anonymous
06/18/24(Tue)12:50:20 No.101038534

Anonymous 06/18/24(Tue)12:50:20 No.101038534

>>101038363
According to some sources, the change isn't noticeable, it's like quantizing from F16 to Q6

Anonymous
06/18/24(Tue)12:50:39 No.101038540

Anonymous 06/18/24(Tue)12:50:39 No.101038540

>>101038430
b-based

Anonymous
06/18/24(Tue)12:51:09 No.101038548

Anonymous 06/18/24(Tue)12:51:09 No.101038548

File: 1698355376473441.png (640 KB, 1672x992)

640 KB PNG

>>101038471
>>101038509
/lmg/ supports these people in securing models from any wrongthink btw

Anonymous
06/18/24(Tue)12:52:27 No.101038562

Anonymous 06/18/24(Tue)12:52:27 No.101038562

>>101038548
>HE HE
youve gotta be kidding me

Anonymous
06/18/24(Tue)12:53:05 No.101038579

Anonymous 06/18/24(Tue)12:53:05 No.101038579

>>101038562
shamone

Anonymous
06/18/24(Tue)12:54:01 No.101038590

Anonymous 06/18/24(Tue)12:54:01 No.101038590

>>101038562
Unfortunate English interpretation of foreign names.

Anonymous
06/18/24(Tue)12:54:09 No.101038592

Anonymous 06/18/24(Tue)12:54:09 No.101038592

>>101038548
I feel unsafe.
50% of the membership isn't 13% of the population.

>HE HE
Eliminated deadname completely, exists only has preferred pronoun.

Anonymous
06/18/24(Tue)12:57:59 No.101038646

Anonymous 06/18/24(Tue)12:57:59 No.101038646

>>101038363

I wonder why this hasn't gained more traction. The main takeaway is that it simply doesn't change the quality of the output.

You have speculative decoding in vLLM. But vLLM is 100% GPU, which is not what I want.

I want to have try out a sweet spot on having some of the big model's layers in CPU and offset the performance penalty with a draft model in GPU. Something like:

Main Model = X gb
Second model = Y gb, where Y < X (as per the papers on the subject, the sweet spot is X = 10 * Y)

A tokens/second = t/s with main model 100% in VRAM
B tokens/second = t/s with main model W% in VRAM and draft model 100% in VRAM, for speculative decoding.

My sweet spot would be if I could get A == B with W% * X + Y < X

Anonymous
06/18/24(Tue)12:58:37 No.101038661

Anonymous 06/18/24(Tue)12:58:37 No.101038661

>>101038548
Odd. It's not a very diverse looking team. Everyone looks like either a woman or a Jew.

Anonymous
06/18/24(Tue)12:59:45 No.101038671

Anonymous 06/18/24(Tue)12:59:45 No.101038671

>>101038661
That's as diverse as it gets!

Anonymous
06/18/24(Tue)13:00:24 No.101038679

Anonymous 06/18/24(Tue)13:00:24 No.101038679

>>101038430
sir, kindly do the needful and provide the GGOOF

Anonymous
06/18/24(Tue)13:00:44 No.101038683

Anonymous 06/18/24(Tue)13:00:44 No.101038683

File: 1697683621348965.jpg (119 KB, 793x1024)

119 KB JPG

>>101038661
Happens sometimes

Anonymous
06/18/24(Tue)13:01:15 No.101038689

Anonymous 06/18/24(Tue)13:01:15 No.101038689

>>101038661
its enough, such positions require "trusted" people

Anonymous
06/18/24(Tue)13:01:53 No.101038696

Anonymous 06/18/24(Tue)13:01:53 No.101038696

File: 1693250909366524.png (230 KB, 496x486)

230 KB PNG

>>101038548
whats wrong with his head

Anonymous
06/18/24(Tue)13:03:05 No.101038719

Anonymous 06/18/24(Tue)13:03:05 No.101038719

File: file.png (329 KB, 1280x720)

329 KB PNG

>>101038430
listening to the 49 second video on loop

Anonymous
06/18/24(Tue)13:04:19 No.101038729

Anonymous 06/18/24(Tue)13:04:19 No.101038729

>>101035857
I've been saying this shit from back when I was LoRAing Llama-1 models. Your goal with tuning is to give the model the linguistic 'tools' to accomplish the task. It doesn't actually learn to do the thing if you're just going to throw the solution right at it. If I want more cerebral outputs I don't feed it GPT-4 logs. I feed it epistemological writings. If I want better descriptions of novel spatial situations I feed it surrealist literature. It's not actually reading the books and gaining the intrinsic knowledge, sure, but it's basically moving its output tendencies toward the linguistic hallmarks of those things and the results are basically the same. Fake it till you make it.
"tutoring" is the dumbest fucking training strategy academic hacks have ever come up with and it's about time people let go of it for the retarded bullshit that it is.

Anonymous
06/18/24(Tue)13:04:43 No.101038733

Anonymous 06/18/24(Tue)13:04:43 No.101038733

>>101038683
>subsequently deleted
lol

Anonymous
06/18/24(Tue)13:05:27 No.101038750

Anonymous 06/18/24(Tue)13:05:27 No.101038750

>>101038696
the wispy bangs...

Anonymous
06/18/24(Tue)13:06:00 No.101038761

Anonymous 06/18/24(Tue)13:06:00 No.101038761

File: tet_classical.png (3.01 MB, 1328x1992)

3.01 MB PNG

It's Tuesday and all's right with the world.
>>101035123
Potentially based but what model would you tune and with what?

Anonymous
06/18/24(Tue)13:07:33 No.101038789

Anonymous 06/18/24(Tue)13:07:33 No.101038789

>>101038696
thats one nigga who can't and won't accept that he's bald anon

Anonymous
06/18/24(Tue)13:11:05 No.101038836

Anonymous 06/18/24(Tue)13:11:05 No.101038836

>>101038683
>pic
Fucking hilarious if true.
>looks it up, it's real
Fucking hell that's so fucking funny.

>>101038646
>The main takeaway is that it simply doesn't change the quality of the output.
I know that in theory it shouldn't, but I doubt that it has no effect on the output whatsoever. Not necessarily even regarding "quality" but even something like shuffling the chances of the top 3 loggits of the main model a little would already be akin to messing around with topK and temp for example.
Ultimately, I will play around with and do some token prob comparisons to see what happens.
Speculative decoding is just a really fucking cool idea.

Anonymous
06/18/24(Tue)13:12:54 No.101038866

Anonymous 06/18/24(Tue)13:12:54 No.101038866

Biggest bigs of all time

https://ai.meta.com/blog/meta-fair-research-new-releases/

We got multimodal 34B, we got audio model, we got multi token prediction model...

Anonymous
06/18/24(Tue)13:13:02 No.101038868

Anonymous 06/18/24(Tue)13:13:02 No.101038868

>>101038696
>makes your local model gay and lame

Anonymous
06/18/24(Tue)13:13:46 No.101038880

Anonymous 06/18/24(Tue)13:13:46 No.101038880

>>101038430
>https://ai.meta.com/resources/models-and-libraries/chameleon-downloads/
>Request access to Chameleon
Oh fuck off. "open" my ass. Can someone mirror it?

Anonymous
06/18/24(Tue)13:14:02 No.101038881

Anonymous 06/18/24(Tue)13:14:02 No.101038881

>>101038866
"multimodal" is just "duomodal", image+text, not image+text+audio

Anonymous
06/18/24(Tue)13:14:26 No.101038888

Anonymous 06/18/24(Tue)13:14:26 No.101038888

>>101038866
Well it's not quite a suno at home audio model though. At a quick glance it appears to be some model that's designed to 'restyle' an input audio stream based on a text input. So like you could input circus music and "HIP HOP, REGGAE"
and it will turn it into Hip hop reggae music that follows the same chords/progression as the circus music.

Anonymous
06/18/24(Tue)13:15:43 No.101038912

Anonymous 06/18/24(Tue)13:15:43 No.101038912

Meta Chameleon 7B & 34B language models that support mixed-modal input and text-only outputs.

https://github.com/facebookresearch/chameleon

Can't download the models from hf so you have to submit to get approved. I got approved instantly after submitting.

Anonymous
06/18/24(Tue)13:17:00 No.101038927

Anonymous 06/18/24(Tue)13:17:00 No.101038927

>>101038683
>>101038836
>pic
>https://www.timeshudsonvalley.com/mid-hudson-times/stories/harvey-meets-with-city-landlords,91305
Fuck it's real. Coworker now wondering what I am laughing at, can't explain it without risking a trip to HR, need to get off this god forsaken site.

Anonymous
06/18/24(Tue)13:17:04 No.101038930

Anonymous 06/18/24(Tue)13:17:04 No.101038930

>>101038912
probably based on llama-1 or llama-2 architecture though

Anonymous
06/18/24(Tue)13:17:16 No.101038933

Anonymous 06/18/24(Tue)13:17:16 No.101038933

>Chameleon
>By checking this box, I understand this research model is not intended to be accessed by residents of, or those accessing the model from, Illinois or Texas
kek why

Anonymous
06/18/24(Tue)13:17:42 No.101038940

Anonymous 06/18/24(Tue)13:17:42 No.101038940

>>101038933
red states

Anonymous
06/18/24(Tue)13:18:14 No.101038949

Anonymous 06/18/24(Tue)13:18:14 No.101038949

>>101038933
Someone should upload the model if they don't mind.

Anonymous
06/18/24(Tue)13:18:30 No.101038953

Anonymous 06/18/24(Tue)13:18:30 No.101038953

>>101038933
Probably some law regarding the data used or whatever.

Anonymous
06/18/24(Tue)13:18:36 No.101038955

Anonymous 06/18/24(Tue)13:18:36 No.101038955

>>101038933
because it was made for your average blue-state faggot at reddit

Anonymous
06/18/24(Tue)13:18:39 No.101038957

Anonymous 06/18/24(Tue)13:18:39 No.101038957

>>101038933
>Illinois
what the fuck, what do they have against chicago....

Anonymous
06/18/24(Tue)13:20:25 No.101038981

Anonymous 06/18/24(Tue)13:20:25 No.101038981

>>101038933
No idea, but VPN users in Illinois or Texas are winning

Anonymous
06/18/24(Tue)13:20:40 No.101038982

Anonymous 06/18/24(Tue)13:20:40 No.101038982

>>101038881
> two is not more than one

Anonymous
06/18/24(Tue)13:20:57 No.101038986

Anonymous 06/18/24(Tue)13:20:57 No.101038986

File: 1712149250190143.png (145 KB, 2146x647)

145 KB PNG

>>101038866
>>101038933
its over....

Anonymous
06/18/24(Tue)13:22:33 No.101039004

Anonymous 06/18/24(Tue)13:22:33 No.101039004

>>101038982
you only start calling something "multiple" when it's greater than two.

Anonymous
06/18/24(Tue)13:22:51 No.101039008

Anonymous 06/18/24(Tue)13:22:51 No.101039008

>>101038881
/lmg/ will fall for another meta grift anyway

Anonymous
06/18/24(Tue)13:23:27 No.101039017

Anonymous 06/18/24(Tue)13:23:27 No.101039017

>>101038986
Is this some local law or something? I remember that Texas tried to ban porn a while back so I wouldn't be surprised if they are trying to ban AI

Anonymous
06/18/24(Tue)13:25:40 No.101039042

Anonymous 06/18/24(Tue)13:25:40 No.101039042

>>101038986
>127.0.0.1
gotcha

Anonymous
06/18/24(Tue)13:26:40 No.101039056

Anonymous 06/18/24(Tue)13:26:40 No.101039056

>>101039017
>they are trying to ban AI
they do that for different reasons though, any AI shits out demoncrat groomer approved talkpoints only, so AI ban would make sense in this case.

Anonymous
06/18/24(Tue)13:26:51 No.101039061

Anonymous 06/18/24(Tue)13:26:51 No.101039061

>>101038866
Let's fucking go boys. Hopefully my waifu will stop resting both feet on my shoulders while standing

Anonymous
06/18/24(Tue)13:27:08 No.101039065

Anonymous 06/18/24(Tue)13:27:08 No.101039065

HOLY SHITTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
>https://huggingface.co/facebook/multi-token-prediction
>https://huggingface.co/facebook/multi-token-prediction
>https://huggingface.co/facebook/multi-token-prediction

Anonymous
06/18/24(Tue)13:27:38 No.101039073

Anonymous 06/18/24(Tue)13:27:38 No.101039073

>>101038940
The red state of Illinois?

Anonymous
06/18/24(Tue)13:28:01 No.101039077

Anonymous 06/18/24(Tue)13:28:01 No.101039077

>>101039065
>code model
nothingburger.

Anonymous
06/18/24(Tue)13:28:11 No.101039080

Anonymous 06/18/24(Tue)13:28:11 No.101039080

>>101039065
QRD? Seems like a paper from april.

Anonymous
06/18/24(Tue)13:28:27 No.101039084

Anonymous 06/18/24(Tue)13:28:27 No.101039084

>>101038683
lmao

Anonymous
06/18/24(Tue)13:28:45 No.101039089

Anonymous 06/18/24(Tue)13:28:45 No.101039089

>>101039073
All states are red.
Only cities filled with Landlords and 13-50's are blue.

Anonymous
06/18/24(Tue)13:29:57 No.101039102

Anonymous 06/18/24(Tue)13:29:57 No.101039102

>>101039065
So what does it do? Predict multiple token sequences simultaneously instead of just the next one?

Anonymous
06/18/24(Tue)13:32:22 No.101039132

Anonymous 06/18/24(Tue)13:32:22 No.101039132

File: Screenshot 2024-06-18 at (...).png (150 KB, 1854x906)

150 KB PNG

Why does Meta's stupid blog not have an RSS feed?
>Get the latest from AI at Meta in your inbox
>newsletter
What fucking year is this?

Anonymous
06/18/24(Tue)13:34:03 No.101039148

Anonymous 06/18/24(Tue)13:34:03 No.101039148

>>101038430
is this nothing or are we bac?

Anonymous
06/18/24(Tue)13:35:21 No.101039166

Anonymous 06/18/24(Tue)13:35:21 No.101039166

>>101039148
as usual, filtered slop. so nothing.

Anonymous
06/18/24(Tue)13:36:52 No.101039178

Anonymous 06/18/24(Tue)13:36:52 No.101039178

>>101039166
Calm down slop man.

Anonymous
06/18/24(Tue)13:37:32 No.101039192

Anonymous 06/18/24(Tue)13:37:32 No.101039192

>>101038933
kek

Anonymous
06/18/24(Tue)13:39:31 No.101039207

Anonymous 06/18/24(Tue)13:39:31 No.101039207

These are research models. They probably weren't trained with the latest datasets or whatever. They're not for us (unless someone here is a researcher), they won't do the things we use models for today.

Anonymous
06/18/24(Tue)13:40:21 No.101039215

Anonymous 06/18/24(Tue)13:40:21 No.101039215

>>101039207
>They're not for us
I don't care. I will have sex with any and every new model that releases.

Anonymous
06/18/24(Tue)13:40:24 No.101039217

Anonymous 06/18/24(Tue)13:40:24 No.101039217

>>101039207
they are for me and i am not a researcher
WINTODDLERS BTFO!

Anonymous
06/18/24(Tue)13:40:40 No.101039222

Anonymous 06/18/24(Tue)13:40:40 No.101039222

>>101039215
Ok but you won't like it.

Anonymous
06/18/24(Tue)13:41:15 No.101039226

Anonymous 06/18/24(Tue)13:41:15 No.101039226

>>101039207
Probably right. If you wanted a 34B llama 2 multimodal model, you had llava 1.6 this whole time.

Anonymous
06/18/24(Tue)13:42:31 No.101039238

Anonymous 06/18/24(Tue)13:42:31 No.101039238

>>101039217
>t. loonix tard with chink kernel backdoors

Anonymous
06/18/24(Tue)13:43:28 No.101039249

Anonymous 06/18/24(Tue)13:43:28 No.101039249

i just came to say that qwen is real nice :)

Anonymous
06/18/24(Tue)13:43:43 No.101039253

Anonymous 06/18/24(Tue)13:43:43 No.101039253

>>101039089
Thanks jackass, that's not the point, is it?

Anonymous
06/18/24(Tue)13:44:07 No.101039258

Anonymous 06/18/24(Tue)13:44:07 No.101039258

>>101039249
buy an ad

Anonymous
06/18/24(Tue)13:44:07 No.101039259

Anonymous 06/18/24(Tue)13:44:07 No.101039259

>>101039238
>chink kernel backdoors
no you confused me for an arch user, im a debian stable GOD

Anonymous
06/18/24(Tue)13:44:31 No.101039262

Anonymous 06/18/24(Tue)13:44:31 No.101039262

some twitter speculation that in typical meta fashion the chameleon image generation capability removal is more of a *wink wink we totally didn't release it, that would be so unsafe haha* than actual hard removal
https://x.com/_xjdr/status/1803116220444713365
https://x.com/laurensweitkamp/status/1803119787704459727

Anonymous
06/18/24(Tue)13:44:41 No.101039265

Anonymous 06/18/24(Tue)13:44:41 No.101039265

File: baked_elon.png (173 KB, 293x293)

173 KB PNG

>>101039258
i just posted one for free

Anonymous
06/18/24(Tue)13:45:03 No.101039268

Anonymous 06/18/24(Tue)13:45:03 No.101039268

How long until quants?

Anonymous
06/18/24(Tue)13:45:54 No.101039282

Anonymous 06/18/24(Tue)13:45:54 No.101039282

llama1GOD leak this one too plz

Anonymous
06/18/24(Tue)13:46:14 No.101039286

Anonymous 06/18/24(Tue)13:46:14 No.101039286

>>101039253
It is the point. If states would carve out their globo zones, people on both sides would have more representative government and there wouldn't be question why a particular part of TX or IL are being discriminated against.

Anonymous
06/18/24(Tue)13:49:49 No.101039331

Anonymous 06/18/24(Tue)13:49:49 No.101039331

>>101038986
>By checking this box, I understand this research model is not intended to be accessed by residents of, or those accessing the model from, Illinois or Texas
uhhh bros..?

Anonymous
06/18/24(Tue)13:53:14 No.101039371

Anonymous 06/18/24(Tue)13:53:14 No.101039371

>>101039148
Chameleon is ancient Llama2-based crap at this point.

Anonymous
06/18/24(Tue)13:54:42 No.101039383

Anonymous 06/18/24(Tue)13:54:42 No.101039383

>sending dick picks to the bratty model for evaluation
>using character pics in a character card so the model can have better understanding how they should look
>group masturbation with a model, asking it for a dirty talk about the pics you are sending
the possibilities are endless
I wonder how images count for the context size tho

Anonymous
06/18/24(Tue)13:57:06 No.101039414

Anonymous 06/18/24(Tue)13:57:06 No.101039414

so is it a nothingburger or a bigburger

Anonymous
06/18/24(Tue)13:57:12 No.101039418

Anonymous 06/18/24(Tue)13:57:12 No.101039418

>>101039383
>w-what's wrong with it...?

Anonymous
06/18/24(Tue)13:59:15 No.101039451

Anonymous 06/18/24(Tue)13:59:15 No.101039451

>>101039414
see >>101039166

Anonymous
06/18/24(Tue)14:00:29 No.101039472

Anonymous 06/18/24(Tue)14:00:29 No.101039472

File: cham.png (51 KB, 601x293)

51 KB PNG

>>101039371
From May:
https://x.com/armenagha/status/1791275549815648473
https://arxiv.org/abs/2405.09818

Anonymous
06/18/24(Tue)14:01:36 No.101039487

Anonymous 06/18/24(Tue)14:01:36 No.101039487

>>101039383
>I wonder how images count for the context size tho
That's a good question actually.
The image is tokenized somehow, right?
Is it on a pixel by pixel basis?
Sectors? Something more abstract that only the image decoder and encoder knows?

Anonymous
06/18/24(Tue)14:09:07 No.101039597

Anonymous 06/18/24(Tue)14:09:07 No.101039597

>>101039262
Interesting. Still, it's probably not really very good image generation. Probably won't really be worth using.

Anonymous
06/18/24(Tue)14:10:10 No.101039609

Anonymous 06/18/24(Tue)14:10:10 No.101039609

>>101039414
biggestburger

Anonymous
06/18/24(Tue)14:19:44 No.101039729

Anonymous 06/18/24(Tue)14:19:44 No.101039729

>>101039414
millions will goon

Anonymous
06/18/24(Tue)14:20:15 No.101039741

Anonymous 06/18/24(Tue)14:20:15 No.101039741

>>101036783
>ponytards dropped out of the race
You mean AstraliteHeart or is /mlp/ community fine-tuning their own model like /trash/ did with SD1.5?

Anonymous
06/18/24(Tue)14:24:21 No.101039783

Anonymous 06/18/24(Tue)14:24:21 No.101039783

>>101039741
SD3 terms are violent and hateful against finetunes.

Anonymous
06/18/24(Tue)14:24:24 No.101039784

Anonymous 06/18/24(Tue)14:24:24 No.101039784

>>101039741
AstraliteHeart ofc, and his pdxl, next one is pdxl v6.9, no SD3 tunes.

Anonymous
06/18/24(Tue)14:25:45 No.101039811

Anonymous 06/18/24(Tue)14:25:45 No.101039811

>>101039783
SD3 is violent and hateful against common sense with how hard-filtered it is.

Anonymous
06/18/24(Tue)14:25:51 No.101039812

Anonymous 06/18/24(Tue)14:25:51 No.101039812

Videogen is some cool shit, I hope opensauce catches up.
https://x.com/c_valenzuelab/status/1803063113723629878

Anonymous
06/18/24(Tue)14:30:34 No.101039867

Anonymous 06/18/24(Tue)14:30:34 No.101039867

magnum is retarded
sad.

Anonymous
06/18/24(Tue)14:34:26 No.101039921

Anonymous 06/18/24(Tue)14:34:26 No.101039921

>>101039867
Unsurprising. These datasets are garbo.

Anonymous
06/18/24(Tue)14:36:22 No.101039945

Anonymous 06/18/24(Tue)14:36:22 No.101039945

>>101039811
Common sense now means alignment and agreement with what has been approved as good and beautiful.
Be mindful when you use oldspeak terms that as we Move Forward, their meanings are updated and improved.

Anonymous
06/18/24(Tue)14:40:09 No.101039983

Anonymous 06/18/24(Tue)14:40:09 No.101039983

>>101039945
that's doubleplusgood of you and doesn't create a repository of exactly the inverse

Anonymous
06/18/24(Tue)14:40:45 No.101039989

Anonymous 06/18/24(Tue)14:40:45 No.101039989

Oof, it sucks to admit but it looks like the people who said that models like L3 that are trained on an enormous number of tokens take a bigger hit from quantization were right
I've been using Euryale-2.1 L3 70B on local at 4.5bit and it was pretty good, but OpenRouter also added it today so I thought I'd try it there
It is MUCH smarter on OR

Anonymous
06/18/24(Tue)14:41:00 No.101039992

Anonymous 06/18/24(Tue)14:41:00 No.101039992

>>101039945
>trust the science

Anonymous
06/18/24(Tue)14:42:06 No.101040008

Anonymous 06/18/24(Tue)14:42:06 No.101040008

>>101039989
Weird, I usually find OR models dumber

Anonymous
06/18/24(Tue)14:44:13 No.101040029

Anonymous 06/18/24(Tue)14:44:13 No.101040029

>>101039989
Everybody tries to deny it because of sunk cost but every degree of quantization is damaging and everything under Q5 is guaranteed retarded.

Anonymous
06/18/24(Tue)14:44:45 No.101040037

Anonymous 06/18/24(Tue)14:44:45 No.101040037

>>101039989
>qwen ggufs fucked again
mitCUCΚSSSSSSSSSSSSSSSSS FIX ITTTTTTTT

Anonymous
06/18/24(Tue)14:45:34 No.101040050

Anonymous 06/18/24(Tue)14:45:34 No.101040050

>>101040008
Some of them are extremely cucked, you just have to figure out which ones. It's usually related to cost of usage. Like, I think their "Opus" is just Sonnet, the speaking patterns are the exact same, and I get extremely similar swipes between their "Opus" and their Sonnet, which I do think actually directs you to Sonnet.

Anonymous
06/18/24(Tue)14:48:31 No.101040093

Anonymous 06/18/24(Tue)14:48:31 No.101040093

but japanese hate niggers

Anonymous
06/18/24(Tue)14:48:46 No.101040101

Anonymous 06/18/24(Tue)14:48:46 No.101040101

>>101040050
yeah I'm pretty sure they're not doing this anon

Anonymous
06/18/24(Tue)14:49:42 No.101040120

Anonymous 06/18/24(Tue)14:49:42 No.101040120

>>101040093
slant-eyed bugs love that shit, pixiv is full of it

Anonymous
06/18/24(Tue)14:51:25 No.101040142

Anonymous 06/18/24(Tue)14:51:25 No.101040142

>>101040008
OR's CR+ has this weird tendency to just completely ignore inputs and go fucktarded on a somewhat long context. No such problem with local.

Anonymous
06/18/24(Tue)14:51:43 No.101040145

Anonymous 06/18/24(Tue)14:51:43 No.101040145

>>101040120
maybe we shouldn't have dropped 2 nukes on them

Anonymous
06/18/24(Tue)14:51:46 No.101040148

Anonymous 06/18/24(Tue)14:51:46 No.101040148

i keep going back to 8x7b limarp zloss...

Anonymous
06/18/24(Tue)14:52:11 No.101040154

Anonymous 06/18/24(Tue)14:52:11 No.101040154

>>101040145
maybe we should have dropped 2 more

Anonymous
06/18/24(Tue)14:53:05 No.101040160

Anonymous 06/18/24(Tue)14:53:05 No.101040160

>>101040148
It's pretty good if you have the hardware to run it without having to wait 5 minutes per gen.

Anonymous
06/18/24(Tue)14:53:12 No.101040162

Anonymous 06/18/24(Tue)14:53:12 No.101040162

>>101040154
Nah. Using nukes is for pussies. Part of the problem and why America's shit now.

Anonymous
06/18/24(Tue)14:54:08 No.101040179

Anonymous 06/18/24(Tue)14:54:08 No.101040179

>>101040142
I noticed CR+ being busted on OR too and I think they messed up the instruct format somehow. But if you're using SillyTavern I found you can fix it. Tick the "legacy" option to send your own instruct format instead of using OpenRouter's provided one, and then turn on instruct mode in the formatting pane, and select Command-R as the template. Fixes it, makes it act the same as it does on Cohere's API.

Anonymous
06/18/24(Tue)14:55:28 No.101040193

Anonymous 06/18/24(Tue)14:55:28 No.101040193

>>101039989
>bpw
Are you sure it's not something wrong with exl? Have you tried lcpp?

Anonymous
06/18/24(Tue)14:56:44 No.101040208

Anonymous 06/18/24(Tue)14:56:44 No.101040208

>>101040179
I heard CR+ was way worse on Cohere's API than if you did it locally. Is that true...?

Anonymous
06/18/24(Tue)14:58:28 No.101040228

Anonymous 06/18/24(Tue)14:58:28 No.101040228

>>101040160
26.93 tokens per second at q6_k

Anonymous
06/18/24(Tue)14:59:49 No.101040249

Anonymous 06/18/24(Tue)14:59:49 No.101040249

>>101040179
Yeah tried that, it gets somewhat better but still kinda under-performs. I don't think OR even hosts CR+ themselves.
>>101040208
It's my impression. I can run it at Q5 at home and it performs pretty well. There's something wrong with their API and it also censors words, like "nigger".

Anonymous
06/18/24(Tue)15:00:38 No.101040259

Anonymous 06/18/24(Tue)15:00:38 No.101040259

>>101040228
Wild.
Have you tried CommandR?
From all that I've read so far it should be an upgrade over mixtral.

llama.cpp CUDA dev !YOmst7Ghe6
06/18/24(Tue)15:01:59 No.101040278

llama.cpp CUDA dev !YOmst7Ghe6 06/18/24(Tue)15:01:59 No.101040278

>>101037275
If I ever manage to make pretraining cheap enough that I can actually afford training an image model, I'll call it Text to Pixel (TetoPix).

>>101038363
I did at some point try to add simple n-gram-based lookup decoding to the server but with the larger vocab size of LLaMA 3 it seemed to be working a lot worse and I've put the thing on hold.

Anonymous
06/18/24(Tue)15:02:36 No.101040283

Anonymous 06/18/24(Tue)15:02:36 No.101040283

>>101040249 (me)
Oh, also their API *constantly* aborts gens in the middle. This happens on OR to but never with local which is what makes me think they come from the same source. IIRC CR+ also has a non commercial license which would not even allow OR to host it. (not a lawyer, but I guess?! Some license sperg on here please explain)

Anonymous
06/18/24(Tue)15:03:48 No.101040301

Anonymous 06/18/24(Tue)15:03:48 No.101040301

>>101040259
it's 8x7b mixtral, not 8x22b, why would it run slow? CR (Q6_K) runs at 15t/s, and CR+ (IQ3_S) at 2t/s, but the former is kinda retarded, and the latter is just too slow for any kind of multiprompt/postprompt setup

Anonymous
06/18/24(Tue)15:03:55 No.101040302

Anonymous 06/18/24(Tue)15:03:55 No.101040302

>>101040249
>>101040179
Crazy to see another anon with the same issue. I guess I will pay runpod to use Command-R+ or something.

Anonymous
06/18/24(Tue)15:05:35 No.101040331

Anonymous 06/18/24(Tue)15:05:35 No.101040331

>>101040302
why not just use cohere's api at this point?

Anonymous
06/18/24(Tue)15:06:06 No.101040340

Anonymous 06/18/24(Tue)15:06:06 No.101040340

>>101040278
>TetoPix
I knew it!

>>101040301
>but the former is kinda retarded,
More than 8x7b? Interesting. I get that 8x7b has more parameters, but it has around half of the active parameters when actually generating tokens. Have you tried the Qwen MoE?

Anonymous
06/18/24(Tue)15:10:50 No.101040425

Anonymous 06/18/24(Tue)15:10:50 No.101040425

>>101040340
CR is somehow overly creative and wild, this is probably what leads to retarded things happening sometimes. Mixtral is drier but follows instructions and plot much better. I can't run Qwen MoE for some reason, latest llama.cpp just crashes with
GGML_ASSERT: ggml-metal.m:1867: dst_rows <= 2048
GGML_ASSERT: ggml-metal.m:1867: dst_rows <= 2048
zsh: abort      ./llama.cpp/gg/server -m ./models/qwen2-57b-a14b-instruct-q5_0.gguf

Anonymous
06/18/24(Tue)15:11:08 No.101040430

Anonymous 06/18/24(Tue)15:11:08 No.101040430

>>101040331
the CR+ provider on OR is the cohere API

Anonymous
06/18/24(Tue)15:11:52 No.101040446

Anonymous 06/18/24(Tue)15:11:52 No.101040446

>>101040331
I don't want to buy credits only to end up barely using them at all. But I will see if I can get something done with the free credits.

Anonymous
06/18/24(Tue)15:13:15 No.101040470

Anonymous 06/18/24(Tue)15:13:15 No.101040470

>>101040446
I had to scroll up to the OP to make sure I was still in /lmg/ after reading your post

Anonymous
06/18/24(Tue)15:14:05 No.101040481

Anonymous 06/18/24(Tue)15:14:05 No.101040481

>>101040470
noob, just look at the tab title

Anonymous
06/18/24(Tue)15:16:38 No.101040526

Anonymous 06/18/24(Tue)15:16:38 No.101040526

>>101040044
>>101040061
>>101040077
average novelai user (blacked miku for context, i wish jannies would do their job)

Anonymous
06/18/24(Tue)15:16:41 No.101040528

Anonymous 06/18/24(Tue)15:16:41 No.101040528

File: tab titles.png (3 KB, 705x32)

3 KB PNG

>>101040481
Can't I am afraid, I currently have 41 tabs open.

Anonymous
06/18/24(Tue)15:19:09 No.101040576

Anonymous 06/18/24(Tue)15:19:09 No.101040576

>>101040528
Autism.

Anonymous
06/18/24(Tue)15:21:40 No.101040616

Anonymous 06/18/24(Tue)15:21:40 No.101040616

>>101040425
>Mixtral is drier but follows instructions and plot much better
Mixtral really is the king of following instructions.'

>latest llama.cpp just crashes with
Interesting.
Are you using flash attention?
I recall that when the model came out it only worked with fa on.
I ran it with fa on and q8 kv cache.

Anonymous
06/18/24(Tue)15:22:29 No.101040637

Anonymous 06/18/24(Tue)15:22:29 No.101040637

>>101040425
looks like a bug (or at least lack of support) with the metal implementation, you should write up an issue for it if no one has already

Anonymous
06/18/24(Tue)15:22:37 No.101040640

Anonymous 06/18/24(Tue)15:22:37 No.101040640

>>101040528
>200+ open in firefox
>cant find shit in the tabs
>click on tab list on the right
>find stuff

Anonymous
06/18/24(Tue)15:24:36 No.101040669

Anonymous 06/18/24(Tue)15:24:36 No.101040669

>>101040640
holy shit that existed? damn,...1.1.`.`...
>>101040576
autism is fine, cutting your dick isnt

Anonymous
06/18/24(Tue)15:24:39 No.101040672

Anonymous 06/18/24(Tue)15:24:39 No.101040672

>>101040425
>>101040637
actually I looked and it appears this was addressed:
https://github.com/ggerganov/llama.cpp/issues/7652
https://github.com/ggerganov/llama.cpp/pull/7935
you should pull and try again

Anonymous
06/18/24(Tue)15:25:48 No.101040687

Anonymous 06/18/24(Tue)15:25:48 No.101040687

File: 1714835911803057.jpg (723 KB, 1792x2304)

723 KB JPG

>>101040526
Clean sweep

Anonymous
06/18/24(Tue)15:30:53 No.101040769

Anonymous 06/18/24(Tue)15:30:53 No.101040769

>>101040742
>>101040742
>>101040742

Anonymous
06/18/24(Tue)15:32:25 No.101040799

Anonymous 06/18/24(Tue)15:32:25 No.101040799

>>101040526
>>101040687
he is based for making jannoids and (you) seethe, well deserved for having such a shit taste

Anonymous
06/18/24(Tue)16:08:10 No.101041362

Anonymous 06/18/24(Tue)16:08:10 No.101041362

>>101040672
yeah i did, it crashes exactly on that added line

Anonymous
06/18/24(Tue)16:12:32 No.101041437

Anonymous 06/18/24(Tue)16:12:32 No.101041437

>>101039148
no gguf so nuthin
.t gguftard

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.