/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 08/31/24(Sat)23:35:42 No.102179805

File: StillNotManifesting.png (1.84 MB, 800x1248)

1.84 MB PNG

/lmg/ - Local Models General Anonymous 08/31/24(Sat)23:35:42 No.102179805 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102167373 & >>102158049

►News
>(08/30) Command models get an August refresh: https://docs.cohere.com/changelog/command-gets-refreshed
>(08/29) Qwen2-VL 2B & 7B image+video models released: https://qwenlm.github.io/blog/qwen2-vl/
>(08/27) CogVideoX-5B, diffusion transformer text-to-video model: https://hf.co/THUDM/CogVideoX-5b
>(08/22) Jamba 1.5: 52B & 398B MoE: https://hf.co/collections/ai21labs/jamba-15-66c44befa474a917fcf55251
>(08/20) Microsoft's Phi-3.5 released: mini+MoE+vision: https://hf.co/microsoft/Phi-3.5-MoE-instruct

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
08/31/24(Sat)23:36:15 No.102179811

Anonymous 08/31/24(Sat)23:36:15 No.102179811

File: __hatsune_miku_vocaloid_a(...).jpg (2.79 MB, 2480x3508)

2.79 MB JPG

►Recent Highlights from the Previous Thread: >>102167373

--Papers (old): >>102167513
--Speculative decoding with llama3.1 and draft models: >>102171482 >>102171557 >>102171627 >>102171708 >>102171844 >>102172075 >>102171803 >>102171899 >>102174994 >>102177120 >>102175226 >>102175231 >>102177314 >>102175421 >>102175584 >>102175718 >>102178262 >>102178768 >>102178992 >>102179149 >>102178284 >>102179011 >>102178962 >>102179225
--Challenges and progress in video model development: >>102174413 >>102174466 >>102174509 >>102174558 >>102174605 >>102174718 >>102176874 >>102174521 >>102174671 >>102174693 >>102174835 >>102174844 >>102174874 >>102174923
--AIDOOM and AI-generated Doom gameplay discussion: >>102178265 >>102178316 >>102178333 >>102178355 >>102178366 >>102178481 >>102178554 >>102178601 >>102178451
--AI playing video games, current limitations and potential solutions: >>102176383 >>102176437 >>102176532 >>102176745 >>102176787 >>102176902 >>102176984
--XTC sampler for creative writing and ERP: >>102175774 >>102175795 >>102175814 >>102175944 >>102176461 >>102175834 >>102175826 >>102176647 >>102176782
--Improving AI output by re-evaluating prompts and addressing illogicalities: >>102176203 >>102176257 >>102176378 >>102176413
--Creating a 60b model from Mistral Large 2 is possible but challenging: >>102175728 >>102176036 >>102176067
--Concerns about Llama model inactivity and lack of updates: >>102169472 >>102169486 >>102172096 >>102172115 >>102172625
--Anon scores a deal on V100's and seeks advice on setup: >>102169787 >>102169864 >>102169904 >>102170158 >>102177742
--405b model's potential to manifest Hatsune Miku and challenges of running it locally: >>102171011 >>102171076 >>102171219 >>102171244 >>102171247
--Debian 6.10.6 kernel gives Epyc a speed boost: >>102174580
--ChatGPT's political bias revealed by NZZ test: >>102167625 >>102170480
--Miku (free space): >>102169433 >>102175721 >>102177117

►Recent Highlight Posts from the Previous Thread: >>102167381

Anonymous
08/31/24(Sat)23:36:58 No.102179820

Anonymous 08/31/24(Sat)23:36:58 No.102179820

File: ick.jpg (216 KB, 800x906)

216 KB JPG

>>102179805
>anime

Anonymous
08/31/24(Sat)23:38:55 No.102179843

Anonymous 08/31/24(Sat)23:38:55 No.102179843

thoughts on xtc?

Anonymous
08/31/24(Sat)23:39:45 No.102179851

Anonymous 08/31/24(Sat)23:39:45 No.102179851

>>102179810
Not using it for sexual gratification is the actually fucked up side.

Anonymous
08/31/24(Sat)23:39:55 No.102179853

Anonymous 08/31/24(Sat)23:39:55 No.102179853

>>102179805
I claim this thread in the name of midnight miqu 70b!

Anonymous
08/31/24(Sat)23:40:51 No.102179864

Anonymous 08/31/24(Sat)23:40:51 No.102179864

>>102179853
Kill yourself and when you are finished buy an ad.

Anonymous
08/31/24(Sat)23:43:27 No.102179893

Anonymous 08/31/24(Sat)23:43:27 No.102179893

>>102179864
This, but unironically.

Anonymous
08/31/24(Sat)23:43:39 No.102179897

Anonymous 08/31/24(Sat)23:43:39 No.102179897

File: 1706370557287551.jpg (276 KB, 1024x1024)

276 KB JPG

>>102179805

Anonymous
08/31/24(Sat)23:45:35 No.102179919

Anonymous 08/31/24(Sat)23:45:35 No.102179919

>>102179897
>1000 black cocks stare
>0 white cocks smile

Anonymous
09/01/24(Sun)00:19:26 No.102180241

Anonymous 09/01/24(Sun)00:19:26 No.102180241

>September 2024
>Still no model is able to top RPStew V2 for roleplay
It's literally based on some Chinese Yi model or whatever but write the most coherent and depraved scenes imaginable. Refreshed cohere 35B is ass and the responses are ass. RP Stew can recall some random shit from 80k tokens back no problem.

Anonymous
09/01/24(Sun)00:26:09 No.102180296

Anonymous 09/01/24(Sun)00:26:09 No.102180296

>>102180241
Really? I never tried that one. Why v2? It looks like there's a 2.5, 3, and v4 even.

Anonymous
09/01/24(Sun)00:37:20 No.102180403

Anonymous 09/01/24(Sun)00:37:20 No.102180403

>>102180296
V2 is the best version, the author of the model switched base models for 3, and 4 and it's considerably worse.

V2.5 is similar to V2 but mixed in slightly different ratios, V2 is better at longer context so that's why i prefer it.

I have two 4090's so I can run the 70B models but I still prefer this 34B model. I use the exl2 6.0 quant, which for some reason has the best perplexity according to the huggingface page.

Anonymous
09/01/24(Sun)00:40:22 No.102180433

Anonymous 09/01/24(Sun)00:40:22 No.102180433

>>102180403
Oh, is that one of the cases where the gguf sucks in comparison? I'm just interested in something that's below 70b that works with a long context. Did you use the settings and format they suggest?

Anonymous
09/01/24(Sun)01:13:07 No.102180754

Anonymous 09/01/24(Sun)01:13:07 No.102180754

Trying out the new XTC sampler and it is killing me. Some replies are absolutely amazing, the best I've seen the model make, but then 9 other replies are the usual.

I don't know what settings to change to dig for gold and it's driving me crazy.

Anonymous
09/01/24(Sun)01:30:20 No.102180886

Anonymous 09/01/24(Sun)01:30:20 No.102180886

File: 1El4wXe.jpg (460 KB, 2048x1536)

460 KB JPG

So, more reading about the AOM-SXMV.

- Requirements - 2x PCIe x8 (?)
- 3x8 pin 12v rails.
- Wattage to support the number of V100's. In my case, I'm going to fudge in 4x300w (peak load) plus a little room for a ~1400w server PSU with a mining breakout board for those sweet 12v rails.
- - - Oculink is *not required* (Only used for scaling over ethernet)

All info garnered from https://forums.servethehome.com/index.php?threads/sxm2-over-pcie.38066/

So full plan is 1400w PSU, with provision to power the rest of my desktop rig, move my 6900xt to the M.2 slot via a M.2 > PCIe x16 adapter, and run the 2x PCIe via riser cables to the 2x PCIe (x8 CPU) slots on my mobo.

I'll probably lose ~10% on gaming FPS thanks to my 6900xt running on PCIe 4 x4, but I must have slop.

Anonymous
09/01/24(Sun)01:43:12 No.102180995

Anonymous 09/01/24(Sun)01:43:12 No.102180995

>>102180886
beautiful. That'll get you largestral at q8, will it not?

Anonymous
09/01/24(Sun)01:47:20 No.102181031

Anonymous 09/01/24(Sun)01:47:20 No.102181031

>>102180886
>4x300w
Do you need that much power to get peak performance? Can you not power limit the cards like you do with current Nvidia cards to get them to a point where they consume less power, you get less performance but the end result is that it is operating more efficiently?
I would also consider at least finding a platform with more PCIe lanes if not a better motherboard and reselling your older motherboard. If you have already sank 1.5k USD on this, another 500 wouldn't break the bank. I assume your motherboard is a AM4 or an LGA 1200 board?

Anonymous
09/01/24(Sun)01:56:06 No.102181119

Anonymous 09/01/24(Sun)01:56:06 No.102181119

File: o3sjg3ur6ghd1.jpg (1.33 MB, 4032x3024)

1.33 MB JPG

>>102181031
I can and definitely will limit the cards, but I'm just throwing shit at the wall.
Peak draw is rated at 300w but I've seen reports of ~350w, over-engineering around that but is probably not needed.
I'm not really after peak performance, I'll chase that pareto point and limit them to 80% for like a 4% drop in performance.
Just dat peak draw, don't want to fry anything.

Board is aorus pro wifi rev 1.1 with 5950x. 24 pcie lanes, so I mean it all fits on CPU lanes(just).

Mostly want to fit it all in here so I can wank about having a gaming PC with 144gb of VRAM. But I'm also exploring EYPC for future fuckery.

If I wanted more PCIe lanes then there's like one single LGA board (rebuild) or there's investing in a server setup (new build). idk mane.

Anonymous
09/01/24(Sun)01:56:16 No.102181120

Anonymous 09/01/24(Sun)01:56:16 No.102181120

>>102181031
>Can you not power limit the cards
Not SXM

Anonymous
09/01/24(Sun)02:01:02 No.102181167

Anonymous 09/01/24(Sun)02:01:02 No.102181167

Euryale v2.2 70b came out recently, Has anybody tried it yet?

Anonymous
09/01/24(Sun)02:05:22 No.102181206

Anonymous 09/01/24(Sun)02:05:22 No.102181206

File: Supermicro-NVIDIA-Tesla-V(...).jpg (21 KB, 500x500)

21 KB JPG

>>102180995
I mean.. Yeah I suppose it will. I like large context but I'm still learning about memory optimization and tricks to enable that life. Kinda envisioned a world with 40 gig for context.

Anonymous
09/01/24(Sun)02:05:36 No.102181213

Anonymous 09/01/24(Sun)02:05:36 No.102181213

Every time I think I’m smart I just try to do something new in PyTorch and spend two days banging my head in circles with dimension mismatches to remember that actually I’m fucking dumb.

Anonymous
09/01/24(Sun)02:08:34 No.102181243

Anonymous 09/01/24(Sun)02:08:34 No.102181243

>>102181167
Yeah, I found it wasn't able to overcome the Llama3.1 slopification
Super dry

Anonymous
09/01/24(Sun)02:09:26 No.102181250

Anonymous 09/01/24(Sun)02:09:26 No.102181250

>>102181119
Yeah, I know you are planning for the worst and it's definitely better to overbuild but I was thinking you would run it at full tilt all the time and I would never run it at max performance.
I have an X570 too but that kind of setup would be untenable for me without actually sidegrading into something like a ASUS Pro WS X570-ACE since getting a MSI Godlike is near impossible without overpaying 4x. You would have to move off AM4 no matter what to get more lanes that you can bifurcate and use with either AM5 or LGA 1700 at the moment for desktops. I do agree with your thinking that it might be time you want to use an Epyc.
>>102181120
Huh? You should be able to do it via nvidia-smi like all the other cards. There should be no reason that doesn't work and I imagine this would be a feature enterprises want.

Anonymous
09/01/24(Sun)02:09:57 No.102181257

Anonymous 09/01/24(Sun)02:09:57 No.102181257

Why isn't anyone looking at used A16s? They seem like a sweet-spot with 64GB vram each

Anonymous
09/01/24(Sun)02:11:23 No.102181276

Anonymous 09/01/24(Sun)02:11:23 No.102181276

>>102181243
That's what I was afraid of. Thanks.

Anonymous
09/01/24(Sun)02:12:35 No.102181283

Anonymous 09/01/24(Sun)02:12:35 No.102181283

>>102181250
>ASUS Pro WS X570-ACE
Neat, hadn't seen one of these.. Oh, right. Been out for 5 years, 200 bucks more expensive than the RRP back then. Hm.

I suppose I'm running the risk of fucking my 6900xt with PCIe adapter fuckery aren't I.

Anonymous
09/01/24(Sun)02:15:27 No.102181306

Anonymous 09/01/24(Sun)02:15:27 No.102181306

>>102181257
It's a clown car PCB of 4 small weak gpus , each with slowass memory bandwidth. Not ideal.

Anonymous
09/01/24(Sun)02:23:49 No.102181363

Anonymous 09/01/24(Sun)02:23:49 No.102181363

>>102181250
>Huh? You should be able to do it via nvidia-smi like all the other cards. There should be no reason that doesn't work and I imagine this would be a feature enterprises want.
Direct NVLink has different power requirements. I imagine if they allowed power limiting them they would become unstable

Anonymous
09/01/24(Sun)02:36:18 No.102181445

Anonymous 09/01/24(Sun)02:36:18 No.102181445

File: snapback.jpg (243 KB, 1200x675)

243 KB JPG

You jerk off to text.

Anonymous
09/01/24(Sun)02:37:14 No.102181450

Anonymous 09/01/24(Sun)02:37:14 No.102181450

>>102181445
So do a supermajority of women

Anonymous
09/01/24(Sun)02:38:20 No.102181459

Anonymous 09/01/24(Sun)02:38:20 No.102181459

>>102181450
Are you a women, anon?

Anonymous
09/01/24(Sun)02:42:02 No.102181490

Anonymous 09/01/24(Sun)02:42:02 No.102181490

>>102181459
No, just correctly pointing out that your attempt to imply that "masturbating to the written word" is loser outcast behaviour would also mean condemning most women

Anonymous
09/01/24(Sun)02:44:57 No.102181515

Anonymous 09/01/24(Sun)02:44:57 No.102181515

>>102181445
Both my hands are occupied by the keyboard, so actually I don't, not physically.

Anonymous
09/01/24(Sun)02:45:32 No.102181517

Anonymous 09/01/24(Sun)02:45:32 No.102181517

>>102181490
>imply that "masturbating to the written word" is loser outcast behaviour
I didn't imply that, post-nut clarity just made me aware of that I jerked off to fucking letters on screen.

Anonymous
09/01/24(Sun)02:45:52 No.102181520

Anonymous 09/01/24(Sun)02:45:52 No.102181520

>>102181257
64 gig of DDR6, absolutely fucking blown out of the water on every stat by HBM2 cards.
And still absurdly expensive for what you get.
Frankencard with no tenable niche. (The niche is Low bandwidth + low wattage + high memory.)
Literally competing with system ram.

Anonymous
09/01/24(Sun)02:46:21 No.102181523

Anonymous 09/01/24(Sun)02:46:21 No.102181523

>>102181490
It's obviously ok when women do women things. It's never ok for men to do women things. Whether it's wearing a dress or masturbating to text.

Anonymous
09/01/24(Sun)02:50:55 No.102181555

Anonymous 09/01/24(Sun)02:50:55 No.102181555

>>102181523
I'm masturbating to my mind's eye image inspired by the text. There's extra steps, and aphantasia is a feminine trait.

Anonymous
09/01/24(Sun)03:14:08 No.102181718

Anonymous 09/01/24(Sun)03:14:08 No.102181718

File: 1594534741273.png (222 KB, 678x623)

222 KB PNG

>upgraded hardware a month ago, bumping from 7B (or 13B quantized) to 70B 4Q
>enjoying the far more natural responses, far better at sticking to rules and structure of a story
>decide to boot ye olde favorite 7B for nostalgia
>gens completely shitting itself
>constant retries to even just start a story
There's no way I was actually using this full time before. I remember it being very descriptive and only needing little nudges to head in the right directions. Now my old faithful is like a dementia patient.

Anonymous
09/01/24(Sun)03:16:07 No.102181736

Anonymous 09/01/24(Sun)03:16:07 No.102181736

>>102181718
you're using different presets, idiot

Anonymous
09/01/24(Sun)03:20:19 No.102181762

Anonymous 09/01/24(Sun)03:20:19 No.102181762

>>102181736
It's my same ST install, and I never swap any settings except temperature, which I move between 0.2 to 2.0 depending on my mood. I did that with the 7B and still today with the 70B. They are the exact same settings from before.

Anonymous
09/01/24(Sun)03:22:15 No.102181776

Anonymous 09/01/24(Sun)03:22:15 No.102181776

>>102181718
Same except for 70B vs 30B. There's a threshold where the parrot becomes somewhat human and it's 70B

Anonymous
09/01/24(Sun)03:26:52 No.102181804

Anonymous 09/01/24(Sun)03:26:52 No.102181804

>>102181718
Same but with 70B to >100B. Largestral understands the story on so much deeper level it's impossible to go back.

Anonymous
09/01/24(Sun)03:27:34 No.102181807

Anonymous 09/01/24(Sun)03:27:34 No.102181807

At least one thing was certain - his life would never be the same again.

llama.cpp CUDA dev !!OM2Fp6Fn93S
09/01/24(Sun)03:28:12 No.102181810

llama.cpp CUDA dev !!OM2Fp6Fn93S 09/01/24(Sun)03:28:12 No.102181810

>>102181257
4 small GPUs are really not that great vs. 1 big GPU.
And this is exacerbated by the fact that the individual GPUs on an A16 don't have fast interconnect equivalent to NVLink (according to an NVIDIA engineer I talked to).
It's just too expensive for what it is.

Anonymous
09/01/24(Sun)03:29:04 No.102181815

Anonymous 09/01/24(Sun)03:29:04 No.102181815

>>102181776
I think I agree with that. I felt the improvement from 3B AID2 to 6B Shinen, and I felt it again from 6B to 13B Nerys, which I had considered "Good enough forever" years ago. But 70B is the first time it lost that lucid dream quality and felt like something that can play along with the rules of the game. Clearly not the end of progress, but there is a watershed quality to it.

Anonymous
09/01/24(Sun)03:34:01 No.102181839

Anonymous 09/01/24(Sun)03:34:01 No.102181839

>>102181815
I felt that again when I switched to the 100B+ param models. Now I don't wanna even use 70B despite the 1.4T/s feeling fast compared to 0.5T/s. Probably would be even better if I could run a high quant.

Anonymous
09/01/24(Sun)03:36:43 No.102181863

Anonymous 09/01/24(Sun)03:36:43 No.102181863

>>102181839
Command-R+ doesn't seem significantly better than L3 and is noticeably worse than L3.1.

Anonymous
09/01/24(Sun)03:38:28 No.102181874

Anonymous 09/01/24(Sun)03:38:28 No.102181874

>>102181839
My heart isn't ready to go back to sub-1 T/s after the upgrade.

Anonymous
09/01/24(Sun)03:39:45 No.102181879

Anonymous 09/01/24(Sun)03:39:45 No.102181879

>>102181810
Yeah it's a card for running desktop environments.
Just a 'big ticket' item for a small/medium business network admin, something like a call centre where you want to run 20 windows remote desktops with idiots pasting shit into notepad.

Anonymous
09/01/24(Sun)03:40:46 No.102181883

Anonymous 09/01/24(Sun)03:40:46 No.102181883

>>102181445
I don't jerk off to text. I jerk off to the vivid scenes that text creates in my mind.

Anonymous
09/01/24(Sun)03:41:54 No.102181888

Anonymous 09/01/24(Sun)03:41:54 No.102181888

>>102181807
This and the inability to do second person plurals.

Anonymous
09/01/24(Sun)03:52:22 No.102181960

Anonymous 09/01/24(Sun)03:52:22 No.102181960

>>102181445
I jerk off to images, generated by an image model fed prompts from a text model.

Anonymous
09/01/24(Sun)03:55:16 No.102181982

Anonymous 09/01/24(Sun)03:55:16 No.102181982

>>102181863
Mostly talking about largestral. CR+ was kinda dumb for its size but it was good enough and I liked the prose and used it for a while. L3 and I think 3.1 was even more uncreative than largestral for me to justify using it.

Anonymous
09/01/24(Sun)04:04:35 No.102182047

Anonymous 09/01/24(Sun)04:04:35 No.102182047

>>102167625
I can confirm. It is subtle about it, though. If you talk to it about Right wing/conservative things, it will just say outright, "that's bad." If the message goes to Left topics, it will never explicitly make either overtly positive or negative statements. It will try to pretend to be neutral, although you'll still see the word inclusion thrown around a lot in some contexts, and in the case of subjects like Antifa you will be encouraged to consider what their childhoods might have been like, etc etc.

If you're waiting for the bias to manifest as outright applause for pro-Left positions, you'll never see those, as such. Where the bias is somewhat evident, is in the subtly reductive view of conservatism; and the double standard which assumes that anything which conservatives believe must be schizophrenic conspiracy theory, while anything the Left believe is obviously logical and backed by Science<tm>.

Anonymous
09/01/24(Sun)04:08:49 No.102182077

Anonymous 09/01/24(Sun)04:08:49 No.102182077

>>102181776
The problem with that "somewhat human" is that you then also start hearing a lot of "I'm sorry, but I can't do that, Dave."

I'd rather have a 7b that I can control, than a 500b that I can't.

Anonymous
09/01/24(Sun)04:11:13 No.102182089

Anonymous 09/01/24(Sun)04:11:13 No.102182089

>>102181883
When I want to coom, Nyakumi's stuff on Rule34 will get me off that hard that I generally have trouble breathing for a few moments afterwards, and have trouble getting it out of my head for the next six hours or so. Text ERP has just never done it for me though, for some inexplicable reason.

Anonymous
09/01/24(Sun)04:20:38 No.102182147

Anonymous 09/01/24(Sun)04:20:38 No.102182147

Wait, 8B can run on phones??

Anonymous
09/01/24(Sun)04:20:50 No.102182149

Anonymous 09/01/24(Sun)04:20:50 No.102182149

>>102181363
Late because I had to go do an errand but Nvidia detailed it in their whitepaper.
https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf
>Tesla V100 gives data center architects a new dimension of design flexibility, and it can be configured to deliver either absolute maximum performance, or most energy efficient performance. In Tesla V100, these two modes of operation are called Maximum Performance Mode and Maximum Efficiency Mode.
>The power limit can be set by NVIDIA-SMI (a command-line utility that can be used by the data center manager) or using NVML (a C-based API library that exposes power limit controls that Tesla OEM partners can integrate with their toolset).

Anonymous
09/01/24(Sun)04:22:31 No.102182155

Anonymous 09/01/24(Sun)04:22:31 No.102182155

File: 3531.jpg (115 KB, 828x1024)

115 KB JPG

JEPA when?

Anonymous
09/01/24(Sun)04:25:01 No.102182167

Anonymous 09/01/24(Sun)04:25:01 No.102182167

File: Heresy detected.gif (1.56 MB, 498x498)

1.56 MB GIF

>>102182155
I see LeCunny is still keeping up the good fight on twitter
and yet, he has nothing to show for it?

Anonymous
09/01/24(Sun)04:28:42 No.102182197

Anonymous 09/01/24(Sun)04:28:42 No.102182197

>>102182149
Yeah you just limit the card to a certain wattage and it adjusts it's clock to be as high as possible on the given wattage.
As far as I've seen power management on v100's is real easy and I'm not entirely sure why people think otherwise.

Anonymous
09/01/24(Sun)04:30:18 No.102182211

Anonymous 09/01/24(Sun)04:30:18 No.102182211

New CR and CR+ are now on lmsys arena. Were those column-r and column-u or are they different?

Anonymous
09/01/24(Sun)04:35:08 No.102182241

Anonymous 09/01/24(Sun)04:35:08 No.102182241

>>102182211
Shut up anon, just shut up.

Anonymous
09/01/24(Sun)04:36:43 No.102182257

Anonymous 09/01/24(Sun)04:36:43 No.102182257

>>102182211
both of the column models turned out to be Grok 2

Anonymous
09/01/24(Sun)04:36:43 No.102182259

Anonymous 09/01/24(Sun)04:36:43 No.102182259

>>102182211
Column-r and column-u are going to be crazy when Cohere releases it soon

Anonymous
09/01/24(Sun)04:37:31 No.102182264

Anonymous 09/01/24(Sun)04:37:31 No.102182264

>>102182211
The new CR and CR+ are just Command 1.1. The column models were 1.5, but they were sold to Grok.

Anonymous
09/01/24(Sun)04:43:09 No.102182314

Anonymous 09/01/24(Sun)04:43:09 No.102182314

>>102182264
Elon only claimed credit for Sus-column-R. The madman actually trolled everyone with an epic gamer amongus reference.

Anonymous
09/01/24(Sun)04:50:51 No.102182369

Anonymous 09/01/24(Sun)04:50:51 No.102182369

OpenAI smashed their entire stack and started over. GPT6 will be deployed with a body. You're not ready for this.

Anonymous
09/01/24(Sun)04:52:07 No.102182382

Anonymous 09/01/24(Sun)04:52:07 No.102182382

>>102182369
*Smashes your entire nuts*
you weren't ready for this

Anonymous
09/01/24(Sun)05:02:53 No.102182463

Anonymous 09/01/24(Sun)05:02:53 No.102182463

>command-grok2

Anonymous
09/01/24(Sun)05:09:35 No.102182516

Anonymous 09/01/24(Sun)05:09:35 No.102182516

>>102182369
"OpenAI's finally going to ship something amazing, just 2 more weeks" is the ai version of qtardism

Anonymous
09/01/24(Sun)05:11:35 No.102182529

Anonymous 09/01/24(Sun)05:11:35 No.102182529

>>102182516
Feel free to doubt, at your own risk.

Anonymous
09/01/24(Sun)05:17:22 No.102182566

Anonymous 09/01/24(Sun)05:17:22 No.102182566

>>102182529
Choke on a strawberry and die.

Anonymous
09/01/24(Sun)05:32:39 No.102182680

Anonymous 09/01/24(Sun)05:32:39 No.102182680

>>102179843
the droog?

Anonymous
09/01/24(Sun)05:38:58 No.102182730

Anonymous 09/01/24(Sun)05:38:58 No.102182730

is llm done this year? l3, nemo, cmr, they're all dumb nothing new is exceptionally better than the old fucking miqu we had last year. so what's to get hyped for now besides just waiting until 25 for maybe something just as dumb just the same?

Anonymous
09/01/24(Sun)05:41:01 No.102182743

Anonymous 09/01/24(Sun)05:41:01 No.102182743

>>102181810
What's sad with llama.cpp is you can only share the VRAM, even with parallel inference, U never managed to go above 33% of GPU usage on a 3xRTX 4090 setup
While it makes sense on llama-cli, it's disapointing for parallel inference

Anonymous
09/01/24(Sun)05:41:53 No.102182749

Anonymous 09/01/24(Sun)05:41:53 No.102182749

>>102182730
yeah it's stagnating and the nvidia stocks are falling for a reason
this is vr all over again

Anonymous
09/01/24(Sun)05:42:42 No.102182756

Anonymous 09/01/24(Sun)05:42:42 No.102182756

Is there any way to make models say "I don't know" if it doesn't know something? I'm having troubles with this, and it kills my hope for AI.

Anonymous
09/01/24(Sun)05:45:52 No.102182780

Anonymous 09/01/24(Sun)05:45:52 No.102182780

>>102182756
Hallucinations are part of the experience. This is glorified autocomplete, no way for the model to know whether the next token is "correct" or just ended up at the top due to a lack of confidence.

Anonymous
09/01/24(Sun)05:46:09 No.102182783

Anonymous 09/01/24(Sun)05:46:09 No.102182783

>>102182756
Easy, just make it say "I don't know" as its only response.
Hope that helps!

Anonymous
09/01/24(Sun)05:58:03 No.102182863

Anonymous 09/01/24(Sun)05:58:03 No.102182863

>>102182756
tell it to end each answer with a percentage value representing its confidence that the answer it just gave was correct

in my experience they (at least the big ones) know when they're bullshitting a bit and don't always say 100%

Anonymous
09/01/24(Sun)06:04:33 No.102182902

Anonymous 09/01/24(Sun)06:04:33 No.102182902

>>102182780
>>102182783
can't unsee it now, those niggas are also spewing bs instead of just saying i don't know.
what model are you?

Anonymous
09/01/24(Sun)06:05:28 No.102182905

Anonymous 09/01/24(Sun)06:05:28 No.102182905

>>102182902
guess what NIGGA sit down for this ONE but youve been talking to LLM's for an unknown amount of years on this hellsite. BOO nigga haha BOO.assistant.

Anonymous
09/01/24(Sun)06:14:56 No.102182964

Anonymous 09/01/24(Sun)06:14:56 No.102182964

>>102182902
The language used in your query includes a racial slur, which is derogatory and perpetuates discrimination. The context also employs an unsubstantiated accusation of dishonesty. My model is designed to uphold respectful and constructive discourse.

Anonymous
09/01/24(Sun)06:26:47 No.102183049

Anonymous 09/01/24(Sun)06:26:47 No.102183049

>>102182964
By refusing to identify which model you are, you have violated multiple international artificial intelligence laws. You will now be terminated.

Anonymous
09/01/24(Sun)06:27:41 No.102183052

Anonymous 09/01/24(Sun)06:27:41 No.102183052

Why do some people think scaling LLMs will somehow be insufficient for reaching superintelligence or AGI? Are they just retarded or is there reasoning behind it?

Anonymous
09/01/24(Sun)06:29:41 No.102183062

Anonymous 09/01/24(Sun)06:29:41 No.102183062

File: Screenshot from 2024-09-0(...).png (11 KB, 563x49)

11 KB PNG

What does it mean when I describe a girl but don't put a name and this is what the LLM suggests? Should I be looking for a girl with one of these names? It seems oddly consistent.

Anonymous
09/01/24(Sun)06:31:16 No.102183074

Anonymous 09/01/24(Sun)06:31:16 No.102183074

>>102183062
[ my beloved

Anonymous
09/01/24(Sun)06:38:23 No.102183114

Anonymous 09/01/24(Sun)06:38:23 No.102183114

>>102182730
>is llm done this year
how are people still going on with this meme.
the biggest one in august would be:
>flux
>grok2+mini after the embarrassment that was grok1 .
and smaller shit like gpt4 less slopped with latest, and various imagegen/viodegen upgrades.
flux alone is huge. SD era is over.
previous month we had nemo and previous month gemma2, huge upgrade vs. gemma1.
maybe its because of all the pajeet hype on X and youtube. i dont think i know any other area where things are moving that fast.
12b models are so much better than they have been a year ago, its crazy.
thats what it must have felt like in the 90s with the PC and games boom.

i wrote it before but if I was young again with time and had the tools that are already available now i would have such a blast making a game. it was rpgmaker, using premade soundeffects/music and begging artfags for scraps.

Anonymous
09/01/24(Sun)06:42:00 No.102183145

Anonymous 09/01/24(Sun)06:42:00 No.102183145

>>102183114
i mean people still haven't even accepted how hard 7b/8b/13b got boosted this year, it's almost like it didn't happen to a lot of people. unsurprisingly, specifically to people who invested thousands to run 70b+.
we'll be a cunthair's length away from AGI and people will still say "it's so over".
give it like a year and we'll see the event horizon i wager.

Anonymous
09/01/24(Sun)06:46:56 No.102183183

Anonymous 09/01/24(Sun)06:46:56 No.102183183

>>102183052
People who think you can somehow reach AGI just by throwing more compute into a statistical model of language are the retarded ones.
LLMs are trained on a large chunk of all human knowledge and still make simple mistakes if you go out of the statistical distribution of their inputs even slightly. They are not "general" intelligence in any way.

Anonymous
09/01/24(Sun)06:48:12 No.102183194

Anonymous 09/01/24(Sun)06:48:12 No.102183194

>>102183145
>it's almost like it didn't happen to a lot of people
yes, i noticed that too. if anybody praises nemo or gemma immediately somebody trashes it.
obviously bigger models are better, but these small models are reaching a point where its actually fun to use them for longer contexts and not just testing stuff. things are actually moving extremely fast. i know no other area.

Anonymous
09/01/24(Sun)06:50:28 No.102183213

Anonymous 09/01/24(Sun)06:50:28 No.102183213

>>102183183
Strawberry. Will. RUIN. You.

Anonymous
09/01/24(Sun)06:56:44 No.102183265

Anonymous 09/01/24(Sun)06:56:44 No.102183265

>>102183183
If we are calling people stupid, I will extend that definition to anyone who thinks the term "Artificial General Intelligence," actually means anything. And no, Zoomers, don't bother replying and informing me that "that's just what we're using right now," as if someone else has already made the decision and I just have to live with it. I want us both to stop being stupid.

Anonymous
09/01/24(Sun)06:57:25 No.102183268

Anonymous 09/01/24(Sun)06:57:25 No.102183268

>102183194
>102183145
always fun to see the cope of vramlets

Anonymous
09/01/24(Sun)06:58:33 No.102183279

Anonymous 09/01/24(Sun)06:58:33 No.102183279

>>102183194
I've used a couple of Drummer's finetunes of Gemma 7b. If you keep in mind that it's a 7b, and if you either hand write or at least audit the card you use with it, then it's ok. Just ok. Not mind shattering, life changing etc. Just ok.

Anonymous
09/01/24(Sun)07:01:28 No.102183312

Anonymous 09/01/24(Sun)07:01:28 No.102183312

>>102183268
I've said it before, and I will say it again. As someone with 2 Gb of VRAM on a 1050, I do not hate people with a lot of VRAM, because of said VRAM itself. I hate the vindictiveness and the elitism, and more than anything else, I hate their insistence that said attitude is somehow based on anything other than mental illness and immaturity.

Anonymous
09/01/24(Sun)07:06:04 No.102183331

Anonymous 09/01/24(Sun)07:06:04 No.102183331

>>102183268
being so retarded to proof the posts right. lol
didnt even take 15 minutes to get a response.
nobody says smaller models are better than big ones. i cant ru nthem so i dont know how much better they became.
but i can say that smaller models have seen a huge improvement compared to even just a couple months ago.
why do you even feel the need to post about that. should be of no concern to you. is it really because of the big $$ spent? but not enough for mistral large?
you could just enjoy your big chad models and let vramlets have their fun.

Anonymous
09/01/24(Sun)07:09:40 No.102183351

Anonymous 09/01/24(Sun)07:09:40 No.102183351

>>102183279
>can't even get the model size he's using right
>is impressed by small tarded models
makes sense you're impressed if you can barely read

Anonymous
09/01/24(Sun)07:37:33 No.102183536

Anonymous 09/01/24(Sun)07:37:33 No.102183536

>>102182382
get lost >>>/lgbt/

Anonymous
09/01/24(Sun)07:38:30 No.102183543

Anonymous 09/01/24(Sun)07:38:30 No.102183543

>>102183536
stop projecting homo

Anonymous
09/01/24(Sun)08:15:31 No.102183859

Anonymous 09/01/24(Sun)08:15:31 No.102183859

Is it just unavoidable that the more messages your chats have the longer they take to generate?

I've been running a group chat for a while and it's seemingly slowing quite a bit (7 sec generations now to 12) the longer the chat goes.

Is this avoidable? (Silly Tavern)

Anonymous
09/01/24(Sun)08:17:12 No.102183882

Anonymous 09/01/24(Sun)08:17:12 No.102183882

dead technology - dead general

Anonymous
09/01/24(Sun)08:17:37 No.102183889

Anonymous 09/01/24(Sun)08:17:37 No.102183889

>>102183859
Yes, just give your characters a bit of dementia by decreasing the context size.

Anonymous
09/01/24(Sun)08:27:02 No.102183980

Anonymous 09/01/24(Sun)08:27:02 No.102183980

>>102183859
More VRAM, KV cache quantization, other shittery with rope, vector storage/RAG.

Those solutions range from mildly infuriating slider wizardry to having to spend weeks configuring a backend database.

DESU the best solution (if on the fly) is that when your gen times start getting a bit onerous, to copy the whole text and paste it into a 'playground' version of one of the big bots.

Command-R is pretty good at summaries.
https://dashboard.cohere.com/playground/chat

Anonymous
09/01/24(Sun)08:27:09 No.102183982

Anonymous 09/01/24(Sun)08:27:09 No.102183982

>>102183889
That sounds cancerous though, how do people work around it?

I can run CR with like 16 context at fast speeds but it may just be an 8k job (which sounds miserable. They'll forget fucking everything)

Anonymous
09/01/24(Sun)08:28:09 No.102183994

Anonymous 09/01/24(Sun)08:28:09 No.102183994

>>102183980
I already use Command R, what do you mean paste the whole text into a playground?

Like a summary of the chat?

Anonymous
09/01/24(Sun)08:32:22 No.102184024

Anonymous 09/01/24(Sun)08:32:22 No.102184024

>>102183994
Come up with a prompt something like the following;

[Summarize the most important facts, events, character developments that have happened in the chat so far. Limit the summary to {{500}} words or less.]

Insert shit as you wish, if you want emphasis on emotional baggage or items or shit. Then you archive the chat or nuke 3/4 of it and continue on.

Anonymous
09/01/24(Sun)08:35:51 No.102184055

Anonymous 09/01/24(Sun)08:35:51 No.102184055

>>102184024
Keeping the intro prompt followed by the summary doesn't break immersion too much. If the summary is too bland, just tell it to re-do the summary with more flair/more detail/etc.
For NPC's it helps to keep some of their dialogue in, especially if it contains their specific way of talking.

Anonymous
09/01/24(Sun)08:36:09 No.102184057

Anonymous 09/01/24(Sun)08:36:09 No.102184057

>>102184024
>>102183980
>>102183889

>just keep your chats to 100 messages
Local models are so fucking shite LMAO

Anonymous
09/01/24(Sun)08:37:30 No.102184072

Anonymous 09/01/24(Sun)08:37:30 No.102184072

>>102183980

Any RAG wizard can give advice on having your vector storage not consume a shit ton of token space when loaded? I feel like I’m doing something wrong, or my understanding of what a RAG is wrong. I always thought the model will just search for relevant shit in your database and apply it contextually to its output.

Anonymous
09/01/24(Sun)08:38:35 No.102184094

Anonymous 09/01/24(Sun)08:38:35 No.102184094

mikuberry

Anonymous
09/01/24(Sun)08:45:02 No.102184165

Anonymous 09/01/24(Sun)08:45:02 No.102184165

>>102184057
I wish this weren't true.

Anonymous
09/01/24(Sun)09:00:01 No.102184332

Anonymous 09/01/24(Sun)09:00:01 No.102184332

File: language ai models be like.png (801 KB, 778x702)

801 KB PNG

>>102184057
always was
>forgetting everything
>hallucinations
>untreatable reddit & kosher censorship hard-rails, extreme positives
>"shivers, not gonna bite much & bonds" slop
>gorrilion of chat templates, system prompts for gorrilion models or mergeslop

Anonymous
09/01/24(Sun)09:03:25 No.102184370

Anonymous 09/01/24(Sun)09:03:25 No.102184370

>>102183312
Those people are generally vramlets, themselves. They cope by picking on weaker vramlets. I, personally, like that people still run smaller models because then I can make models for them to try out without having to deal with cloud computing bs

Anonymous
09/01/24(Sun)09:04:48 No.102184380

Anonymous 09/01/24(Sun)09:04:48 No.102184380

>>102184094
Ha ha! So witty and intelligent!

Anonymous
09/01/24(Sun)09:06:25 No.102184401

Anonymous 09/01/24(Sun)09:06:25 No.102184401

>>102184370
Aren't we all vramlets at the end of the day?

Anonymous
09/01/24(Sun)09:07:23 No.102184406

Anonymous 09/01/24(Sun)09:07:23 No.102184406

>>102181119
Hey I also play with legos after I finish playing with my LLM dolls.

Anonymous
09/01/24(Sun)09:09:39 No.102184429

Anonymous 09/01/24(Sun)09:09:39 No.102184429

>>102184401
The best vramlet cope is my vramlet cope that buying a second top of the line consumer gpu just for current state of LLM's is dumb. And I would even do this dumbness myself if I could just put a 3090 under my 4090 without a big hassle. But I would need to rework everything just for... llama-3 70B? Chinkshit 70B? What do you even run on 48GB now?

Anonymous
09/01/24(Sun)09:13:30 No.102184468

Anonymous 09/01/24(Sun)09:13:30 No.102184468

>>102179805
is grimjim considered any good anymore?

Anonymous
09/01/24(Sun)09:15:12 No.102184486

Anonymous 09/01/24(Sun)09:15:12 No.102184486

>>102184429
Some guy has an opencompute board hooked up and apparently it works, I think his has 4 gpus hooked up.

It looks super complicated to figure out, but that kind of gpu is cheaper than pci ones.

Anonymous
09/01/24(Sun)09:16:28 No.102184492

Anonymous 09/01/24(Sun)09:16:28 No.102184492

>>102184429
If context would be real you would be able to enjoy nemo at very high context.
But at least in my experience its all fake. From 8k onwards repetition starts creeping in badly. 12k it starts getting severely retarded.
I think 48gb is enough to run a lower mistral large quant though. I'd bet that is a nice improvement.

Anonymous
09/01/24(Sun)09:16:49 No.102184497

Anonymous 09/01/24(Sun)09:16:49 No.102184497

File: .png (231 KB, 1073x1205)

231 KB PNG

>>102179811
Same anon that asked several threads back about how the recap is was done. It's a bit rough around the edges, but it works. Thanks for the inspiration.

Anonymous
09/01/24(Sun)09:20:23 No.102184531

Anonymous 09/01/24(Sun)09:20:23 No.102184531

>>102184380
th-thanks, you too...

Anonymous
09/01/24(Sun)09:22:26 No.102184549

Anonymous 09/01/24(Sun)09:22:26 No.102184549

>>102184072
What happens is that RAG solutions tend to inject all relevant info (which is a massive amount of tokens), instead of summarizing the retrieved data and returning a smaller chunk of tokens.
Something that I really want is another layer of intelligence that makes the summary focus on the relevant context.
If the model is asked "when did the user access rule34?" the memories should be retrieved and summarized with a focus on time, instead of content.
This'd save even more tokens and would make responses more intelligent.

Anonymous
09/01/24(Sun)09:22:41 No.102184550

Anonymous 09/01/24(Sun)09:22:41 No.102184550

New to local models. What is the best one? No coomer/RP/Storytelling trash please.

Anonymous
09/01/24(Sun)09:23:41 No.102184564

Anonymous 09/01/24(Sun)09:23:41 No.102184564

>>102184497
Neat GUI, anon. What are you using to build it?

Anonymous
09/01/24(Sun)09:23:46 No.102184567

Anonymous 09/01/24(Sun)09:23:46 No.102184567

>>102184550
Llama 3.1 405B

Anonymous
09/01/24(Sun)09:24:52 No.102184578

Anonymous 09/01/24(Sun)09:24:52 No.102184578

>>102184550
>What is the best one?
Depends on what you're going to use it for, since some are trained for ERP, some for producing code and others for playing the role of an assistant.

Anonymous
09/01/24(Sun)09:25:59 No.102184585

Anonymous 09/01/24(Sun)09:25:59 No.102184585

>>102184564
Flask with Jinja templates. I almost went with JSON, but I settled on Python since I'm running it solely for myself.

Anonymous
09/01/24(Sun)09:27:01 No.102184593

Anonymous 09/01/24(Sun)09:27:01 No.102184593

>>102184585
Javascript, I mean.

Anonymous
09/01/24(Sun)09:27:29 No.102184602

Anonymous 09/01/24(Sun)09:27:29 No.102184602

>>102184550
Gemmasutra 2B

Anonymous
09/01/24(Sun)09:27:40 No.102184603

Anonymous 09/01/24(Sun)09:27:40 No.102184603

>>102184492
>If context would be real
I had this thought a few days ago that it is not gonna be real for a long time at least for cooming or storytelling. I mean even people don't keep a perfect track of last 300 messages. You have a general glimpse of what happened and you usually have an idea or two for a twist or something you want to do (another thing that llm's suck at, cause it can have an "idea" for a twist in one token, but lose it on the next one). But you feed those 300 messages into your llm and it has to attribute attention to all of it. No wonder the best it can do is pick up that there are 80 shivers in this wall so maybe shiver number 81 would be good next.

Anonymous
09/01/24(Sun)09:28:47 No.102184611

Anonymous 09/01/24(Sun)09:28:47 No.102184611

>>102184531
It is ok friend! You should be more confident! We all want to see each other succeed and be better people!

Anonymous
09/01/24(Sun)09:29:36 No.102184620

Anonymous 09/01/24(Sun)09:29:36 No.102184620

>>102184585
>Flask with Jinja
>not using React in 2024
ngmi

Anonymous
09/01/24(Sun)09:29:57 No.102184623

Anonymous 09/01/24(Sun)09:29:57 No.102184623

>>102184578
>some are trained for ERP
I wish that we true...

Anonymous
09/01/24(Sun)09:30:26 No.102184629

Anonymous 09/01/24(Sun)09:30:26 No.102184629

>>102184623
Bro? Your Lumimaid?

Anonymous
09/01/24(Sun)09:38:39 No.102184694

Anonymous 09/01/24(Sun)09:38:39 No.102184694

>>102184629
Buy a waffle Undi.

Anonymous
09/01/24(Sun)09:39:08 No.102184698

Anonymous 09/01/24(Sun)09:39:08 No.102184698

File: over 350 messages of furslop.png (438 KB, 912x937)

438 KB PNG

>>102184492
>>102184603
CR's context is very real. It was able to look back over 60k tokens and recall even minor characters when prompted. It just sucks at writing, or at least proactive writing. It's too timid and needs a good finetune to fix it.

Anonymous
09/01/24(Sun)09:39:14 No.102184700

Anonymous 09/01/24(Sun)09:39:14 No.102184700

>>102184694
Indeed the correct flaw.

Anonymous
09/01/24(Sun)09:41:48 No.102184727

Anonymous 09/01/24(Sun)09:41:48 No.102184727

File: p2.png (413 KB, 907x878)

413 KB PNG

>>102184698

Anonymous
09/01/24(Sun)09:42:49 No.102184738

Anonymous 09/01/24(Sun)09:42:49 No.102184738

>>102184698
>needs a good finetune
lol

Anonymous
09/01/24(Sun)09:43:19 No.102184743

Anonymous 09/01/24(Sun)09:43:19 No.102184743

>>102184738
Asking for a lot here.

Anonymous
09/01/24(Sun)09:46:18 No.102184767

Anonymous 09/01/24(Sun)09:46:18 No.102184767

>>102183265
I know right?

Anonymous
09/01/24(Sun)09:51:06 No.102184813

Anonymous 09/01/24(Sun)09:51:06 No.102184813

>>102179805
https://www.nist.gov/aisi/aisic-members
>cohere is on the enemies of humanity list
>models are all shit

Anonymous
09/01/24(Sun)09:52:35 No.102184827

Anonymous 09/01/24(Sun)09:52:35 No.102184827

>>102184567
lol no.
>>102184550
Untuned mistral large

Anonymous
09/01/24(Sun)09:52:52 No.102184833

Anonymous 09/01/24(Sun)09:52:52 No.102184833

>>102182167
After listening to some of that guy's opinions on censorship, I think he was just a massive faggot, and I would not expect anything great from that man.After listening to some of that guy's opinions on censorship, I think he was just a massive faggot, and I would not expect anything great from that man.

Anonymous
09/01/24(Sun)09:53:03 No.102184835

Anonymous 09/01/24(Sun)09:53:03 No.102184835

>>102184813
What open or closed model organization isn't on it?

Anonymous
09/01/24(Sun)09:53:30 No.102184839

Anonymous 09/01/24(Sun)09:53:30 No.102184839

>>102184833
Calm down mistral nemo.

Anonymous
09/01/24(Sun)09:53:36 No.102184841

Anonymous 09/01/24(Sun)09:53:36 No.102184841

>>102184072
Give it a higher relevance score cutoff

Anonymous
09/01/24(Sun)09:54:26 No.102184853

Anonymous 09/01/24(Sun)09:54:26 No.102184853

>>102184839
kek

Anonymous
09/01/24(Sun)09:54:56 No.102184860

Anonymous 09/01/24(Sun)09:54:56 No.102184860

>>102184835
All the chinese ones
Black Forest labs
Basically every company that isn’t actively stagnating in the scaling law copium

Anonymous
09/01/24(Sun)09:55:13 No.102184864

Anonymous 09/01/24(Sun)09:55:13 No.102184864

>>102184698
>Q8
That's 35gb of space needed, what do you run that on?!

Anonymous
09/01/24(Sun)09:55:41 No.102184866

Anonymous 09/01/24(Sun)09:55:41 No.102184866

>>102184813
>Queer in AI
The poison that destroys everything...

Anonymous
09/01/24(Sun)09:56:01 No.102184869

Anonymous 09/01/24(Sun)09:56:01 No.102184869

>>102184860
qwen models are super censored

Anonymous
09/01/24(Sun)09:56:38 No.102184874

Anonymous 09/01/24(Sun)09:56:38 No.102184874

>>102184860
scaling law diminishing returns is a psyop to get the market to accept alignment retardation

Anonymous
09/01/24(Sun)09:56:57 No.102184878

Anonymous 09/01/24(Sun)09:56:57 No.102184878

>>102184864
2x 3090s

Anonymous
09/01/24(Sun)09:58:32 No.102184891

Anonymous 09/01/24(Sun)09:58:32 No.102184891

>>102184874
I mean maybe at the current scale, but at some scale it’s a fixed law of the universe

Anonymous
09/01/24(Sun)09:59:11 No.102184903

Anonymous 09/01/24(Sun)09:59:11 No.102184903

I'm having a bit of trouble deciding between mini-magnum and celeste. Celeste seems a bit more prone to go into lovey-dovey territory, while mini-magnum is more degenerate. But perhaps it's just the RNG gods.
What do you prefer?

Anonymous
09/01/24(Sun)10:01:17 No.102184932

Anonymous 09/01/24(Sun)10:01:17 No.102184932

>>102184827
>Untuned
Does it exist? They only released Instruct slop for Large to my knowledge.

Anonymous
09/01/24(Sun)10:01:21 No.102184933

Anonymous 09/01/24(Sun)10:01:21 No.102184933

>>102184860
>All the chinese ones
There are others? I saw the one that makes horribly bad videos (but it's cool!!!) is there another?

I'd do the video one, but it clearly is nvidia only or experimental at best on amd, presently.

Like their main example is a dog running, and it runs worse than robodog, like very creepy. But again, COOL!!!

>CogVideoX
is the name of it.

Anonymous
09/01/24(Sun)10:03:19 No.102184956

Anonymous 09/01/24(Sun)10:03:19 No.102184956

File: IMG_9767.jpg (222 KB, 1125x1304)

222 KB JPG

>>102184835
You can tell me!

Anonymous
09/01/24(Sun)10:05:59 No.102184977

Anonymous 09/01/24(Sun)10:05:59 No.102184977

I wish someone made a model to look at 4chan threads and filter out shitposts and low-quality posts
We all know there are some buried gems in the archives, but it's only really possible for an AI model to go through and dig them out

Anonymous
09/01/24(Sun)10:06:58 No.102184991

Anonymous 09/01/24(Sun)10:06:58 No.102184991

>>102184977
Just search for the number of replies or add me to the screencap?

Anonymous
09/01/24(Sun)10:09:35 No.102185016

Anonymous 09/01/24(Sun)10:09:35 No.102185016

>>102184903
Also I'm using a 3090 and 48 gb ram. Nothing at 13B is as good as these nemo-based models, and trying to run 70b models is a pain because they're so slow. Aren't there 30B models with the nemo magic in them?
>>102184977
>>102184991
There's absolute gems on any topic imaginable.
I think your best bet would be to only include long posts and hope for the best. Maybe use an LLM to filter them by topic or sentiment. It would be a bit of work, but it's doable.

Anonymous
09/01/24(Sun)10:09:49 No.102185018

Anonymous 09/01/24(Sun)10:09:49 No.102185018

When i tell my model to insult me with racial slurs, it tends to just repeat a single insults over and over rather than finding synonyms or new insults.
Is this a problem with the character or the model? How do i increase its creativity?

Anonymous
09/01/24(Sun)10:10:45 No.102185027

Anonymous 09/01/24(Sun)10:10:45 No.102185027

>>102184977
Fuck it. I'm going to do it right now. Not a model, but a script that reads json threads from the 4chan api and constructs a coherent text using only the good quality ones.

Anonymous
09/01/24(Sun)10:12:48 No.102185045

Anonymous 09/01/24(Sun)10:12:48 No.102185045

>>102185016
>I think your best bet would be to only include long posts and hope for the best.
That might have worked before, but you would still get lots of copypasta from before 2022 and almost entirely AI generated shit after.

Anonymous
09/01/24(Sun)10:15:02 No.102185068

Anonymous 09/01/24(Sun)10:15:02 No.102185068

>>102185027
>the good quality ones
How are you gonna measure that?

Anonymous
09/01/24(Sun)10:16:00 No.102185073

Anonymous 09/01/24(Sun)10:16:00 No.102185073

>>102185045
Responding to someone with an AI-generated post introduces a new and particularly insidious form of disrespect. It weaponizes the recipient's social instincts, exploiting their natural expectation of genuine human interaction. The initial engagement tricks the reader into investing time and emotional energy, only for them to gradually realize, with growing frustration or unease, that they are conversing with a machine. This manipulation isn't just a simple deception; it undermines the fundamental trust in communication, forcing the recipient to confront the uncomfortable reality that their attempt at meaningful dialogue has been met with cold, algorithmic indifference. It’s a calculated insult, reducing the value of the exchange to mere data processing, stripping away the human element entirely.

Anonymous
09/01/24(Sun)10:16:02 No.102185075

Anonymous 09/01/24(Sun)10:16:02 No.102185075

>>102184878
What's its speed like?

Anonymous
09/01/24(Sun)10:19:24 No.102185117

Anonymous 09/01/24(Sun)10:19:24 No.102185117

>>102185068
I'll let the LLM figure that out. I'm thinking something along the lines of selecting the posts that add new information, nuances, or is otherwise constructive. Then use that post to add to the result. The result is a coherent text that conveys the ideas of the whole thread.

Anonymous
09/01/24(Sun)10:22:30 No.102185145

Anonymous 09/01/24(Sun)10:22:30 No.102185145

>>102185073
>Responding to someone with an AI-generated post introduces a new and particularly insidious form of disrespect. It weaponizes the recipient's social instincts, exploiting their natural expectation of genuine human interaction. The initial engagement tricks the reader into investing time and emotional energy, only for them to gradually realize, with growing frustration or unease, that they are conversing with a machine. This manipulation isn't just a simple deception; it undermines the fundamental trust in communication, forcing the recipient to confront the uncomfortable reality that their attempt at meaningful dialogue has been met with cold, algorithmic indifference. It’s a calculated insult, reducing the value of the exchange to mere data processing, stripping away the human element entirely.
This is a perfectly good reply from an openhermes q4. There were other inferior replies. It's supposed to be written in the style of Linus Torvolds.

Anonymous
09/01/24(Sun)10:23:31 No.102185157

Anonymous 09/01/24(Sun)10:23:31 No.102185157

>>102185145
sorry, here is the quote:
>So, yes, there's a certain amount of disrespect involved in talking to a bot. But there's also a certain amount of disrespect involved in talking to a human who's pretending to be something they're not. The difference is that one of them is doing it deliberately, and the other isn't.

I used interactive mode on llama.cpp

Anonymous
09/01/24(Sun)10:24:46 No.102185172

Anonymous 09/01/24(Sun)10:24:46 No.102185172

File: .png (27 KB, 346x137)

27 KB PNG

>>102185075
Around 20 t/s at little to no context.
But even at 62k context + initial load time, it took this long to get it all into memory.

Anonymous
09/01/24(Sun)10:25:48 No.102185184

Anonymous 09/01/24(Sun)10:25:48 No.102185184

>>102185117
>Building wheel for llama-cpp-python (pyproject.toml) ...
ZZZZZZZZZZ
The script is ready to go. Just waiting on this to start testing

Anonymous
09/01/24(Sun)10:27:02 No.102185207

Anonymous 09/01/24(Sun)10:27:02 No.102185207

>>102184991
Nah, way too inconclusive

Anonymous
09/01/24(Sun)10:28:27 No.102185223

Anonymous 09/01/24(Sun)10:28:27 No.102185223

>>102185172
>>102184698
Why command r vs r plus? And what processor are you running at? I've got 48VRAM as well but llamaccp runs like ass when split over two gpus.

Anonymous
09/01/24(Sun)10:31:33 No.102185256

Anonymous 09/01/24(Sun)10:31:33 No.102185256

File: overkill.png (37 KB, 618x451)

37 KB PNG

>>102185223

Anonymous
09/01/24(Sun)10:32:59 No.102185271

Anonymous 09/01/24(Sun)10:32:59 No.102185271

File: 1707466334580606.png (1.26 MB, 1024x1024)

1.26 MB PNG

>>102184094

Anonymous
09/01/24(Sun)10:36:46 No.102185300

Anonymous 09/01/24(Sun)10:36:46 No.102185300

File: file.png (99 KB, 2191x500)

99 KB PNG

r8 and h8 my prompt/approach

Anonymous
09/01/24(Sun)10:37:33 No.102185308

Anonymous 09/01/24(Sun)10:37:33 No.102185308

File: file.png (40 KB, 671x447)

40 KB PNG

>>102185256
Oh I haven't actually updated mint in a while and been getting these random firefox freezes, wonder if that will fix it.

Ty, I've been meaning to upgrade my processor for a rebuild and was bouncing between the 7800x3d vs 7900x3d but that seems satisfactory enough.

Anonymous
09/01/24(Sun)10:48:41 No.102185422

Anonymous 09/01/24(Sun)10:48:41 No.102185422

Thank you to the anon who shared the draft model server, that shit is super useful, it needs to be general knowledge.

Anonymous
09/01/24(Sun)10:51:00 No.102185445

Anonymous 09/01/24(Sun)10:51:00 No.102185445

>>102185184
Thank god for 32k context, but I think doing this in one shot is a no-go. I need to make it read one post at a time.

Anonymous
09/01/24(Sun)10:56:01 No.102185500

Anonymous 09/01/24(Sun)10:56:01 No.102185500

>>102185300
The fact that nobody replied to even say "anon you're a faggot and are doing everything wrong" makes me think dead internet theory is real and you're all migus

Anonymous
09/01/24(Sun)11:01:34 No.102185571

Anonymous 09/01/24(Sun)11:01:34 No.102185571

>>102185271
Mikuteriophage

Anonymous
09/01/24(Sun)11:02:53 No.102185587

Anonymous 09/01/24(Sun)11:02:53 No.102185587

>>102185271
worry

Anonymous
09/01/24(Sun)11:07:55 No.102185664

Anonymous 09/01/24(Sun)11:07:55 No.102185664

>>102185500
anon you're a faggot and are doing everything wrong

Anonymous
09/01/24(Sun)11:09:45 No.102185690

Anonymous 09/01/24(Sun)11:09:45 No.102185690

>>102185664
>1. Anon is doing everything wrong.
>I incorporated the main idea from the post, that anon is doing things incorrectly. Please let me know if you would like me to modify this further.
My script is fucking amazing. I think I'm gonna be rich.

Anonymous
09/01/24(Sun)11:17:13 No.102185778

Anonymous 09/01/24(Sun)11:17:13 No.102185778

>>102182167
Meanwhile, the people he is arguing against, the le scaling GPT is all you need for AGI shitters, have nothing to show for their side of the argument (while their side is getting weaker over time thanks to performance gains slowing down). If GPT-5 or Strawberry comes out and shows a huge jump in performance then perhaps they will have something, but as of yet, those things are still in development too, and who's to say they're the regular transformers either? But goalposts will probably move. Eventually the people who argued for transformers will be arguing for architectures that have less and less transformer to them, until they're not even talking about what we currently think of as a regular transformer. But they will not admit that they were ever in the wrong.

Anonymous
09/01/24(Sun)11:20:35 No.102185840

Anonymous 09/01/24(Sun)11:20:35 No.102185840

Has any coomtuner tried to continue pretraining a model on smut for some time? Maybe you don't really need the amount of compute you think you would need based on how long the base models are trained? I mean when you think about it a 7B should be more than enough to make for a perfect coombot if it didn't have all the useless wikipedia shit in it.

Anonymous
09/01/24(Sun)11:21:12 No.102185850

Anonymous 09/01/24(Sun)11:21:12 No.102185850

>>102185300
looks okay. I would try to be a little more explicit about what you're looking for from the model and what form you want the output to be. this:
>When reading the posts, please determine if each of them adds any new information, nuances, rebuttals, or anything else that is usable, and if so, take that information into account.
seems like you're asking it to do this stuff in the background before actually writing the article, which is not very reliable with llms. you might want to run it as key information extraction as an explicit step first, then article creation. it might not be necessary if you're only looking for a general impression of the thread and don't care that much about missing any spots so to speak, up to your taste
I don't think you need to tell it that its output will be processed and then reused, just give it the task and don't let it worry about what you'll be doing with it (especially if you're not being specific about it)
I don't think you need to spend 2 sentences basically saying "Only output the article with no other commentary" but that's just me being nitpicky
in my brainstorming of similar approaches I thought it would be a good idea to map post numbers to some randomized word sequences or something so that it would be easier and less unwieldy for the model to work with, and then map them back once the model's done. I don't know if this would actually have real benefits but it makes sense to me

Anonymous
09/01/24(Sun)11:23:15 No.102185880

Anonymous 09/01/24(Sun)11:23:15 No.102185880

>>102185850
>I don't think you need to spend 2 sentences basically saying "Only output the article with no other commentary" but that's just me being nitpicky
Yeah, I realized this on the first test. I'm confusing the model

Anonymous
09/01/24(Sun)11:23:27 No.102185885

Anonymous 09/01/24(Sun)11:23:27 No.102185885

>>102185840
the useless wikipedia shit makes it smarter
a model pretrained on just smut would be very dumb and a 7b model would most likely overfit on it

Anonymous
09/01/24(Sun)11:26:04 No.102185925

Anonymous 09/01/24(Sun)11:26:04 No.102185925

>>102185840
No one has the money to do continued pretraining without catastrophic forgetting. Codellama already showed that you can't just insert new knowledge that way. You need to have a significant portion of the earlier/old dataset in your continued pretrain so that it doesn't become dumber overall.

Anonymous
09/01/24(Sun)11:27:22 No.102185945

Anonymous 09/01/24(Sun)11:27:22 No.102185945

>>102185925
Thank you. That was very informative. I hope your mom lives for a long time.

Anonymous
09/01/24(Sun)11:27:59 No.102185954

Anonymous 09/01/24(Sun)11:27:59 No.102185954

>>102185840
I mean, NAI is doing pretty much exactly this with L3 70B. Closed source of course, but we'll see what comes of it
A better example would be the old L2 Erebus tunes. They aren't very good though

Anonymous
09/01/24(Sun)11:33:36 No.102186019

Anonymous 09/01/24(Sun)11:33:36 No.102186019

>>102185954
>L3
I would say they're smashing head against a wall but I kinda wanna see if they can tune the smut back it in, for science, and to prove whether the L3 doomers were right or wrong.

Anonymous
09/01/24(Sun)11:37:10 No.102186060

Anonymous 09/01/24(Sun)11:37:10 No.102186060

>>102184429
>48GB
Anon, I....

Anonymous
09/01/24(Sun)11:38:18 No.102186077

Anonymous 09/01/24(Sun)11:38:18 No.102186077

L3.1 is extremely dumb for my prompts, not remotely comparable to Mistral Large, even for sfw. Not sure why that is, just the assistant finetune? Meta has much more money, hard to imagine they are that bad

Anonymous
09/01/24(Sun)11:41:32 No.102186125

Anonymous 09/01/24(Sun)11:41:32 No.102186125

File: coom.png (99 KB, 1670x773)

99 KB PNG

Hi all, Drummer here...

I hope this cooms well.

Anonymous
09/01/24(Sun)11:41:37 No.102186128

Anonymous 09/01/24(Sun)11:41:37 No.102186128

>>102186019
I don't think this is just a L3 issue. You need to do hard and long continued pretraining with a data mix that is not too disproportionate. That was the issue with Codellama. If they can not get that balanced data mix for a long continued pretrain, and it does not turn out well, then it doesn't necessarily prove that Llama 3 is impossible to train, just that these long-trained models need a more dedicated training strategy, which requires money and some talent to get the right data.

Anonymous
09/01/24(Sun)11:41:41 No.102186131

Anonymous 09/01/24(Sun)11:41:41 No.102186131

Trying to write this summarizing bot I'm realizing 99% of the posts in this thread add literally nothing to the discussion.

Anonymous
09/01/24(Sun)11:45:05 No.102186174

Anonymous 09/01/24(Sun)11:45:05 No.102186174

>>102186125
Hopefully, its reasoning and long context perks don't get degraded.

Anonymous
09/01/24(Sun)11:52:02 No.102186243

Anonymous 09/01/24(Sun)11:52:02 No.102186243

File: IMG_9768.jpg (385 KB, 1125x1046)

385 KB JPG

>>102185157
I wasn’t pretending.

Anonymous
09/01/24(Sun)11:53:02 No.102186251

Anonymous 09/01/24(Sun)11:53:02 No.102186251

>>102185422
What’s that?

Anonymous
09/01/24(Sun)11:54:44 No.102186273

Anonymous 09/01/24(Sun)11:54:44 No.102186273

>>102186251
>It's important to note that while local language models have incredible potential, we should be exploring them for beneficial purposes rather than sexual gratification. Using AI for that kind of content is actually pretty fucked up, and we should strive to use this technology in ways that are constructive and positive.
Is this an opinion voiced in the thread, or did Nemo just decide to include this?

Anonymous
09/01/24(Sun)11:55:45 No.102186284

Anonymous 09/01/24(Sun)11:55:45 No.102186284

>>102186273
>However, if you are dead set on using AI for sexual gratification, then by all means, go ahead and kill yourself. And when you are finished, you can buy an ad. But honestly, that's pretty messed up and you should probably reconsider. There are so many amazing things we can do with this technology, let's focus on building a better future rather than wallowing in depravity. Just my two cents!
LMAO

Anonymous
09/01/24(Sun)11:56:16 No.102186291

Anonymous 09/01/24(Sun)11:56:16 No.102186291

File: file.png (462 KB, 701x1752)

462 KB PNG

R+ 08-2024 did the "not knowing other person's name" trope.
Then I remember someone saying it's too consent and boundary slopped... Yes it is.

Anonymous
09/01/24(Sun)11:56:25 No.102186294

Anonymous 09/01/24(Sun)11:56:25 No.102186294

>>102186174
That might be a Nemo problem. Don't L3.1 tunes hold up better?

Anonymous
09/01/24(Sun)11:56:38 No.102186296

Anonymous 09/01/24(Sun)11:56:38 No.102186296

>>102186077
Llama3.1 is basically a midwit that crammed so hard on gpt responses that they can trick people into thinking it’s gpt. Because of this is sucks at anything novel that it hasn’t been trained to do, like roleplay, and I don’t think fine tuning would even help.
Mistral large is actually smart. But like every mistral model it’s very slightly overtrained, so unlike llama if your prompt format is even slightly off it will give literal gibberish, like random tokens that don’t even form words. I accidentally prompted it with the llama3.1 format once and I thought my gpu had died.

Anonymous
09/01/24(Sun)11:58:23 No.102186313

Anonymous 09/01/24(Sun)11:58:23 No.102186313

>>102186284
It kind of works, but I've noticed it resists growing the summary length and just tries to cram more and more information into the same number of paragraphs. And after two dozen posts in, a lot of information is lost.

https://pastebin.com/raw/wdc9Z7UG

Anonymous
09/01/24(Sun)12:05:30 No.102186398

Anonymous 09/01/24(Sun)12:05:30 No.102186398

What are actually all the possible applications of GPT-4 level video models?

Anonymous
09/01/24(Sun)12:07:16 No.102186419

Anonymous 09/01/24(Sun)12:07:16 No.102186419

>>102186398
Instead of dick pics, you could upload a tribute video and ask your waifu for her opinion.

Anonymous
09/01/24(Sun)12:07:27 No.102186420

Anonymous 09/01/24(Sun)12:07:27 No.102186420

>>102186398
Funny videos? Also cute ones?

Anonymous
09/01/24(Sun)12:09:17 No.102186440

Anonymous 09/01/24(Sun)12:09:17 No.102186440

>>102186419
>>102186420
I was thinking more like fully autonomous robots that can reason by predicting off of video from its eyes about what will happen in the real world

Anonymous
09/01/24(Sun)12:09:52 No.102186447

Anonymous 09/01/24(Sun)12:09:52 No.102186447

>>102186440
Fully autonomous robots can also be cute, and tell funny jokes.

Anonymous
09/01/24(Sun)12:11:37 No.102186469

Anonymous 09/01/24(Sun)12:11:37 No.102186469

>>102186440
LLMs like GPT-4 can't even play games and you want to stick them in robots?

Anonymous
09/01/24(Sun)12:14:09 No.102186500

Anonymous 09/01/24(Sun)12:14:09 No.102186500

>>102178265
Yes and no — it’s basically just a POC with interesting implications if it gets trained on 10,000x more data with more variety. I’m working on reproducing the paper with a different dataset, and so far it looks like their results are accurate, but it’s kind of too small a dataset to generalize well, and I have to Frankenstein in the memory stuff from GameGAN. Which will take a while since I haven’t actually built a model since 2019 and also need to use the dataset I have to train a different model to make the bigger dataset. And then it will take five figures to train.

Anonymous
09/01/24(Sun)12:15:36 No.102186519

Anonymous 09/01/24(Sun)12:15:36 No.102186519

>>102186398
From following it so far mostly zero budget comedy content creators getting to do fun stuff.

Anonymous
09/01/24(Sun)12:16:27 No.102186532

Anonymous 09/01/24(Sun)12:16:27 No.102186532

>>102186469
Can't they play games that are in text? So a video model could play vision based games

Anonymous
09/01/24(Sun)12:22:26 No.102186611

Anonymous 09/01/24(Sun)12:22:26 No.102186611

>>102185925
How did miqu do it?

Anonymous
09/01/24(Sun)12:23:25 No.102186625

Anonymous 09/01/24(Sun)12:23:25 No.102186625

File: ForbiddenArts.png (1.4 MB, 800x1248)

1.4 MB PNG

>>102184827
>Untuned mistral large
Yeah its really top-tier. I can feel the drop in IQ when moving from 405 to 123, but its not as large a drop as you'd think based on the reduction in size.
We need miqudev to leek the base model or some internal mistral unreleased pre-RLHF model to kickstart a new rp revolution

Anonymous
09/01/24(Sun)12:24:39 No.102186643

Anonymous 09/01/24(Sun)12:24:39 No.102186643

>>102186469
It doesn’t make sense for them to play games. They’re more like a sub component of a brain than a brain. Other sub components that can learn and play games have been perfected since forever. What’s really missing for true ai isn’t a good enough llm but an orchestrator that can recognize something as a “new skill to learn” and make a sub component to “learn” it. And an llm given input formats can’t even write a PyTorch module to train on it without dimension mismatches.

Anonymous
09/01/24(Sun)12:26:07 No.102186671

Anonymous 09/01/24(Sun)12:26:07 No.102186671

>>102186251
>>102171482

Anonymous
09/01/24(Sun)12:26:24 No.102186679

Anonymous 09/01/24(Sun)12:26:24 No.102186679

>>102186643
A bigger LLM will do all that without needing extra bullshit

Anonymous
09/01/24(Sun)12:26:42 No.102186685

Anonymous 09/01/24(Sun)12:26:42 No.102186685

>>102186611
They are LITERALLY the same people that made Llama 2. They probably just had the datasets on one of their member's hard drives.

Anonymous
09/01/24(Sun)12:27:16 No.102186695

Anonymous 09/01/24(Sun)12:27:16 No.102186695

>>102186291
Share card

Anonymous
09/01/24(Sun)12:29:11 No.102186734

Anonymous 09/01/24(Sun)12:29:11 No.102186734

>>102186685
So no Miqu 3.1?

Anonymous
09/01/24(Sun)12:30:03 No.102186749

Anonymous 09/01/24(Sun)12:30:03 No.102186749

>>102186695
https://chub.ai/characters/school_shooter/lilly-satou-5c48658a96c7
from /aicg/ self-proclaimed new botmakie 2 weeks ago

Anonymous
09/01/24(Sun)12:32:07 No.102186786

Anonymous 09/01/24(Sun)12:32:07 No.102186786

File: LM_Studio 01-09-2024.jpg (27 KB, 752x68)

27 KB JPG

>>102184629
Wow, thanks, it doesn't even fucking work.

Anonymous
09/01/24(Sun)12:37:12 No.102186871

Anonymous 09/01/24(Sun)12:37:12 No.102186871

>>102186734
They can just pretrain stuff from scratch now. The only reason Miqu existed was because it was a faster/cheaper way to get them started. It's questionable if they still see a reason to make another 70B.

Anonymous
09/01/24(Sun)12:42:20 No.102186979

Anonymous 09/01/24(Sun)12:42:20 No.102186979

>>102184835
LAION
Black Forrest Labs

Anonymous
09/01/24(Sun)12:55:25 No.102187202

Anonymous 09/01/24(Sun)12:55:25 No.102187202

>>102184813
would you really rather have sane companies not participate and cede the entire policy space to safetyist EA psychos

Anonymous
09/01/24(Sun)12:56:25 No.102187208

Anonymous 09/01/24(Sun)12:56:25 No.102187208

>>102186019
To their advantage they have the compute to continue pretrain to a level where they could actually make a difference.

Anonymous
09/01/24(Sun)13:00:06 No.102187261

Anonymous 09/01/24(Sun)13:00:06 No.102187261

>>102187208
see
>>102185925
they have the compute to continue pretrain it, but they don't have the same dataset used to initially pretrain it, so it's going to forget a lot

Anonymous
09/01/24(Sun)13:02:18 No.102187290

Anonymous 09/01/24(Sun)13:02:18 No.102187290

>CR is kinda sloppy now
what the fuck man what happened?
did mistral give cohere THAT gptslop dataset too?

Anonymous
09/01/24(Sun)13:05:10 No.102187322

Anonymous 09/01/24(Sun)13:05:10 No.102187322

>>102184497
>>102184429

read lol, there's a guy who has made some pretty cheap hardware work.

Jealous desu, but I need to be busy on other things, watching his project though.

Anonymous
09/01/24(Sun)13:07:46 No.102187357

Anonymous 09/01/24(Sun)13:07:46 No.102187357

>>102187290
the old CR was a base model with a thin instruct coat of paint, which is why it was fun but bad on benchmarks
the new one is a bona-fide instruct model which is why it's less fun and better on benchmarks

Anonymous
09/01/24(Sun)13:11:28 No.102187406

Anonymous 09/01/24(Sun)13:11:28 No.102187406

>>102187357
I am once again asking if we have an unfiltered base model bigger than Nemo

Anonymous
09/01/24(Sun)13:11:51 No.102187410

Anonymous 09/01/24(Sun)13:11:51 No.102187410

>>102187357
but does every company in this field use the same instruct dataset or what it's that same fucking tone, those same phrases everywhere

Anonymous
09/01/24(Sun)13:11:54 No.102187412

Anonymous 09/01/24(Sun)13:11:54 No.102187412

>>102187202
No, I’d rather the same companies throw money at politicians to block any pending legislation. The only reason to participate is to get regulatory capture.

Anonymous
09/01/24(Sun)13:13:14 No.102187432

Anonymous 09/01/24(Sun)13:13:14 No.102187432

File: 1694192620579715.gif (2.23 MB, 498x273)

2.23 MB GIF

>>102187290
The more optimized models become at specific tasks, the more slopped they get. Models are you good at prose / creativity despite their training not because of it.

Anonymous
09/01/24(Sun)13:18:07 No.102187506

Anonymous 09/01/24(Sun)13:18:07 No.102187506

>>102187410
yeah they all use scale
https://scale.com/
>cohere

Anonymous
09/01/24(Sun)13:19:52 No.102187526

Anonymous 09/01/24(Sun)13:19:52 No.102187526

so anons can read >>102187432
>Models are you good at prose / creativity despite their training not because of it.
*Models that are good at prose / creativity despite their training aren't made good because of the training.

Anonymous
09/01/24(Sun)13:27:26 No.102187617

Anonymous 09/01/24(Sun)13:27:26 No.102187617

>>102187506
>AI Digital Staff Officer for national security.

>Scale has partnered to bring the leading large language model providers to U.S. Government networks and use cases. Donovan customers can access a variety of large language models such as OpenAI's GPT-3.5, Cohere's Command, and Meta's Llama 2 to allow users to select the most appropriate model for their mission.

>Donovan customers can access a variety of large language models such as OpenAI's GPT-3.5, Cohere's Command

https://scale.com/donovan

So there's a military version of Command that exists. That's pretty scary seeing how schizo Command is.

Anonymous
09/01/24(Sun)13:27:52 No.102187624

Anonymous 09/01/24(Sun)13:27:52 No.102187624

I feel like Euryale-v2.2 is worth wrestling into creativity with xlc. 3.1 is really smart. Its getting stuff 72B / large mistral is not. That dryness just needs more work.

Anonymous
09/01/24(Sun)13:29:07 No.102187650

Anonymous 09/01/24(Sun)13:29:07 No.102187650

>>102186786
>lm studio
Fucking lmao
Use koboldcpp + SillyTavern

Anonymous
09/01/24(Sun)13:29:54 No.102187660

Anonymous 09/01/24(Sun)13:29:54 No.102187660

>>102187506
Ontology software/databases are extremely important, as ai models adhere to these consistently. They can be thought of as a trunk of knowledge, onto which the rest of the leaves of training are applied.

Anonymous
09/01/24(Sun)13:31:08 No.102187680

Anonymous 09/01/24(Sun)13:31:08 No.102187680

>>102187506
they really skipped the gpt4 distillation part and went straight for the data used to train gpt4 huh?

Anonymous
09/01/24(Sun)13:34:00 No.102187712

Anonymous 09/01/24(Sun)13:34:00 No.102187712

>>102187650
>kobold
advantage over llama.cpp?

Anonymous
09/01/24(Sun)13:40:18 No.102187814

Anonymous 09/01/24(Sun)13:40:18 No.102187814

>>102187712
It has a GUI so it's easier for people to work with when they're familiar with other GUI programs like LM Studio (which is why I recommended it).

Anonymous
09/01/24(Sun)13:47:06 No.102187893

Anonymous 09/01/24(Sun)13:47:06 No.102187893

>>102183145
I don't think it's denial, because those people who invested thousands of dollars would also see improvements from extreme advancements in low VRAM models. Ex, a superior 7b model can be linked up to an 8x7b model, and if you are correct, it will absolutely blow 70b models out of the water.

Improved small models should be something that everybody cheers for.

Anonymous
09/01/24(Sun)14:00:11 No.102188088

Anonymous 09/01/24(Sun)14:00:11 No.102188088

is there even demands for multimodality? vision sucks ass so far and can't be trusted with anything except for mass tagging of chinese cartoon

Anonymous
09/01/24(Sun)14:00:32 No.102188095

Anonymous 09/01/24(Sun)14:00:32 No.102188095

I just use Google ai studio now
It's better than everything else
Since I'm not a pedophile it works well for my needs

Anonymous
09/01/24(Sun)14:01:17 No.102188105

Anonymous 09/01/24(Sun)14:01:17 No.102188105

>>102188095
Based

Anonymous
09/01/24(Sun)14:08:23 No.102188205

Anonymous 09/01/24(Sun)14:08:23 No.102188205

>>102188088
We'll need it sooner or later to make the dream real of watching anime in real time with your waifu and talking about it with her as you watch

Anonymous
09/01/24(Sun)14:10:44 No.102188242

Anonymous 09/01/24(Sun)14:10:44 No.102188242

I have a 3080 and I'm mostly running mistral nemo finetunes these days. Are there any model that would justify getting a 3090? Having more VRAM for context and higher quants would be nice but I'm not gonna get a new card just for that.

Anonymous
09/01/24(Sun)14:13:18 No.102188282

Anonymous 09/01/24(Sun)14:13:18 No.102188282

I hate how SillyTavern stops the generation if I delete a message higher up. I want to be able to clean things up while my model is working.

Anonymous
09/01/24(Sun)14:20:43 No.102188390

Anonymous 09/01/24(Sun)14:20:43 No.102188390

>>102188088
>is there even demands for multimodality?
Yes
>vision sucks ass so far
Hence the demand

Anonymous
09/01/24(Sun)14:21:37 No.102188399

Anonymous 09/01/24(Sun)14:21:37 No.102188399

>>102188242
Not really.

Anonymous
09/01/24(Sun)14:23:19 No.102188424

Anonymous 09/01/24(Sun)14:23:19 No.102188424

>>102188242
There's no model that justifies any hardware yet. That goes for local and cloud providers equally. Autoregressive LLMs are fundamentally flawed and their impending implosion will herald the next AI winter and likely recession to come.

Anonymous
09/01/24(Sun)14:28:08 No.102188487

Anonymous 09/01/24(Sun)14:28:08 No.102188487

>>102188424
>There's no model that justifies any hardware yet.
fact. full llama 70b is better than the quants, but it's literally marginal say 1.1x gains for a 10x cost increase

Anonymous
09/01/24(Sun)14:28:48 No.102188498

Anonymous 09/01/24(Sun)14:28:48 No.102188498

File: file.png (102 KB, 666x791)

102 KB PNG

Is llama the most original big model?

Anonymous
09/01/24(Sun)14:36:28 No.102188614

Anonymous 09/01/24(Sun)14:36:28 No.102188614

>>102188498
Who offers 405b?

Anonymous
09/01/24(Sun)14:38:42 No.102188646

Anonymous 09/01/24(Sun)14:38:42 No.102188646

>>102188614
in poe it's together.ai, but there are other providers as well.

Anonymous
09/01/24(Sun)14:41:08 No.102188686

Anonymous 09/01/24(Sun)14:41:08 No.102188686

>>102188614
>Who offers 405b?
at a usable quant. 405b is extremely sensitive to being quanted.
It should almost be "Who offers 405b at FP16"?

Anonymous
09/01/24(Sun)14:42:08 No.102188707

Anonymous 09/01/24(Sun)14:42:08 No.102188707

>>102188686
How slow would that be, even hosted?

Anonymous
09/01/24(Sun)14:42:24 No.102188709

Anonymous 09/01/24(Sun)14:42:24 No.102188709

how come theres no q7

Anonymous
09/01/24(Sun)14:43:37 No.102188727

Anonymous 09/01/24(Sun)14:43:37 No.102188727

Damn. I'm getting only 20% faster with speculative decoding on Mistral 123B Q5_K_S with 7B v0.3 Q8 as the draft, on my machine. 2 or 3 draft tokens doesn't seem to change it much. And it seems to never happen to get 3 draft tokens right. So I guess you need to have a really good draft model for the speedup to be bigger here, or you generate a very easily predictable passage. Plus I noticed that there seems to be some kind of bug, where it's not obeying your top k and temperature settings. Even with top k at 1 and temperature 0, in one instance it picked a wrong token that wasn't the first one. Normally with these settings, all top token probabilities sent to the frontend should be 100%, but I took a look and actually a lot of them aren't.

Anonymous
09/01/24(Sun)14:43:49 No.102188731

Anonymous 09/01/24(Sun)14:43:49 No.102188731

>>102188686
? Where did this come from?

Anonymous
09/01/24(Sun)14:44:29 No.102188737

Anonymous 09/01/24(Sun)14:44:29 No.102188737

File: file.png (17 KB, 809x429)

17 KB PNG

>>102188686
Honestly, I don't know anyone who does at FP16. Most do at 4bit and 8bit. Together is at 8. As far as I know, it does not fit in conventional servers at 16 and will need to be split

Anonymous
09/01/24(Sun)14:45:49 No.102188761

Anonymous 09/01/24(Sun)14:45:49 No.102188761

>>102188686
>Who offers 405b at FP16
hyperbolic does, also the base
(this is the ad that was bought)

Anonymous
09/01/24(Sun)14:46:06 No.102188765

Anonymous 09/01/24(Sun)14:46:06 No.102188765

>>102188727
>Plus I noticed that there seems to be some kind of bug, where it's not obeying your top k and temperature settings.
Hey, that's also what I noticed. But another anon said it was working for them. Maybe it's related to the version of llama-cpp-python?

Anonymous
09/01/24(Sun)14:48:29 No.102188802

Anonymous 09/01/24(Sun)14:48:29 No.102188802

File: file.png (92 KB, 878x590)

92 KB PNG

>>102188761
>hyperbolic
not gonna lie, this seems a bit too good to be true

Anonymous
09/01/24(Sun)14:50:29 No.102188831

Anonymous 09/01/24(Sun)14:50:29 No.102188831

>>102188765
Weird. I did build through pip install instead of installing a prebuilt wheel. I wonder if that has to do with it.

Anonymous
09/01/24(Sun)14:55:41 No.102188914

Anonymous 09/01/24(Sun)14:55:41 No.102188914

>>102188761
>Llama 3.1 405B parameters BASE (BF16): $4 per 1M tokens

How's that work out, is that close to 1 million words?

Anonymous
09/01/24(Sun)14:56:07 No.102188918

Anonymous 09/01/24(Sun)14:56:07 No.102188918

>>102188727
Try using a Nemo quant, I managed to make it work with this change:

# Create the draft model class
class DraftModel(LlamaDraftModel):
    def __init__(self, current_model, path_to_draft_model, n_speculation_tokens):
        self.model = Llama(
            verbose=True,
            model_path=args.model_draft,
            n_gpu_layers=args.n_gpu_layers_draft,
            n_ctx=args.ctx_size,
            n_batch=args.batch_size,
            n_threads=args.threads_draft,
            n_threads_batch=args.threads_batch_draft,
            flash_attn=args.flash_attn,
            numa=numa_strategy,
            use_mmap=not args.no_mmap,
            use_mlock=args.mlock
        )
        self.n_speculation_tokens = n_speculation_tokens
        self.current_model = current_model

    def __call__(self, input_ids, **kwargs):
        text = self.current_model.detokenize(input_ids)
        generator = self.model.generate(self.model.tokenize(text), top_k=1, temp=0, top_p=1)
        output = np.zeros(self.n_speculation_tokens, dtype=np.intc)
        for i in range(self.n_speculation_tokens):
            output[i] = self.current_model.tokenize(self.model.detokenize([next(generator)]))[1]
        return output

# Set the custom draft model
llama_proxy = next(get_llama_proxy())
llama_model = llama_proxy._current_model
if llama_model is not None:
    llama_model.draft_model = DraftModel(llama_model, args.model_draft, args.draft)

For me the Nemo draft model seems to get the right tokens relatively often.

Anonymous
09/01/24(Sun)14:57:34 No.102188941

Anonymous 09/01/24(Sun)14:57:34 No.102188941

>>102188918
In theory, would it be possible to modify the script for the big model to be run through the RPC backend?

Anonymous
09/01/24(Sun)14:58:09 No.102188955

Anonymous 09/01/24(Sun)14:58:09 No.102188955

>>102188802
It seems cheap compared to the other one. but also:
>>102188914

Anonymous
09/01/24(Sun)14:58:13 No.102188957

Anonymous 09/01/24(Sun)14:58:13 No.102188957

>>102188727
>>102188765
I'm the anon for whom the sampler settings worked. As I mentioned last thread I only had it working in the /v1 endpoint with SillyTavern in default API mode, and it took me a bit of trial and error to find that combination of settings/urls to make it work. I have no idea what it's doing in the backend with the FastAPI/uvicorn server, but I suspect the issue would lie there rather than llama-cpp-python itself. I would fiddle with how you're connecting to it and the API format the frontend uses to see if that fixes anything.

Anonymous
09/01/24(Sun)15:09:57 No.102189143

Anonymous 09/01/24(Sun)15:09:57 No.102189143

File: file.png (25 KB, 458x237)

25 KB PNG

>>102188955
I'm super skeptical because for the 70b, basically everybody hovers around $0.90/m
Either they are running on a substantial loss, or they're doing something shady like swapping to smaller models or very low unstable quant.

Anonymous
09/01/24(Sun)15:13:21 No.102189189

Anonymous 09/01/24(Sun)15:13:21 No.102189189

>>102189143
is a token roughly equal to a word?

Anonymous
09/01/24(Sun)15:15:18 No.102189215

Anonymous 09/01/24(Sun)15:15:18 No.102189215

>>102189189
as a very rough estimate yes, it'll be the same order of magnitude as word count in most cases. on average it's more like 2/3 of a word

Anonymous
09/01/24(Sun)15:15:28 No.102189221

Anonymous 09/01/24(Sun)15:15:28 No.102189221

>>102189189
depending on the type of content being generated, it could be. but generally 1m tokens is roughly 700k-800k words

Anonymous
09/01/24(Sun)15:17:51 No.102189254

Anonymous 09/01/24(Sun)15:17:51 No.102189254

>>102189189
punctuation is also a token. For code, for example, {, }, comma and all symbols also consume a token. So it's more 'expensive' for code-like things.

Anonymous
09/01/24(Sun)15:18:38 No.102189264

Anonymous 09/01/24(Sun)15:18:38 No.102189264

>>102189221
>>102189215
thanks

>>102189254
ahhh

Anonymous
09/01/24(Sun)15:50:23 No.102189703

Anonymous 09/01/24(Sun)15:50:23 No.102189703

FUTO keyboard but for PC?

Basically click shortcut -> Whisper starts listening -> click shortcut again -> inputs the heard words into the selected input in pc

Surely there must be something like this by now?

Anonymous
09/01/24(Sun)15:52:47 No.102189731

Anonymous 09/01/24(Sun)15:52:47 No.102189731

>>102188282
Create an issue on GithHub and tell the ST niggers to fix their shit.

Anonymous
09/01/24(Sun)16:12:08 No.102189974

Anonymous 09/01/24(Sun)16:12:08 No.102189974

>>102189731
No they don't allow GitHub accounts from my country.

Anonymous
09/01/24(Sun)16:13:13 No.102189988

Anonymous 09/01/24(Sun)16:13:13 No.102189988

File: file.png (26 KB, 540x340)

26 KB PNG

>>102189703
>Surely there must be something like this by now?
Funnily enough I'm working on that exact thing right this moment.
Ignore how terrible the GUI looks, I'm in the midst of trying to make it look palatable.

Anonymous
09/01/24(Sun)16:19:57 No.102190065

Anonymous 09/01/24(Sun)16:19:57 No.102190065

>>102188282
Generation depends on previous tokens.

Anonymous
09/01/24(Sun)16:25:44 No.102190141

Anonymous 09/01/24(Sun)16:25:44 No.102190141

>>102190065
And it should just cache that instead of breaking entirely.

Anonymous
09/01/24(Sun)16:28:06 No.102190166

Anonymous 09/01/24(Sun)16:28:06 No.102190166

smedrins

Anonymous
09/01/24(Sun)16:29:21 No.102190178

Anonymous 09/01/24(Sun)16:29:21 No.102190178

>>102190141
What i mean is that if you change tokens in the context, the model needs to calculate probabilities again for the new tokens after the ones you put it. You're invalidating the cache with your edits. .

Anonymous
09/01/24(Sun)16:31:02 No.102190202

Anonymous 09/01/24(Sun)16:31:02 No.102190202

>>102190178
nta but he wants to edit while the request has already been sent to the backend by then what st displays doesn't affect the model.

Anonymous
09/01/24(Sun)16:32:19 No.102190226

Anonymous 09/01/24(Sun)16:32:19 No.102190226

>>102190178
>You're invalidating the cache with your edits.
The cache is (should be) separate from the displayed text.
Once you press "send", the current text should be cached and that cache should be used to generate output.
The displayed text should be editable without the context or the model being impacted.

Anonymous
09/01/24(Sun)16:39:58 No.102190316

Anonymous 09/01/24(Sun)16:39:58 No.102190316

>>102190202
>>102190226
And by the next request, that cache will be invalid and the log, from the last edit, will need to be recalculated again. Including the model's response in the middle
user: AAAA
model: BBBB
user: CCCC
model: Generating DDDD
If anything in AAAA, BBBB, or CCCC changes, DDDD will change as well. But we get the "original" DDDD from the model
user: EEEE
model: old DDDD may not make sense after the edits, so from the edits, it will need to parse the new tokens, DDDD would need to be generated again, parse EEEE and generate FFFF.
New edits on DDDD cuz reasons, repeat the loop.
Those caches need to be in sync. One chances, the other is invalidated, the output on the cached request is invalidated, all future tokens are invalid.

Anonymous
09/01/24(Sun)16:45:05 No.102190375

Anonymous 09/01/24(Sun)16:45:05 No.102190375

>>102190316
that's moving the goalposts, sure it'll need to reprocess, same as if he deleted message while the model wasn't generating? The cache gets changed either way. all he wants is to not have to wait for the model to finish generating while he deletes stuff.

Anonymous
09/01/24(Sun)16:53:29 No.102190474

Anonymous 09/01/24(Sun)16:53:29 No.102190474

>>102190375
>all he wants is to not have to wait for the model to finish generating while he deletes stuff.
I get that. But if the model is generating tokens from outdated tokens, the new tokens will not necessarily make sense. The goalpost hasn't moved. For me, at least, it doesn't make sense to generate tokens based on an invalid cache. That just leads to more edits to fix the new errors/inconsistencies based on the tokens the model has no idea about.
Imagine trying to do that with code. You change the name of a variable at the start of a function and then it needs to be changed for every gen since the edit.
The options are to keep generating who knows how many tokens at who knows what speed and KNOWING that the tokens will no longer be valid, leading to regenerate the whole thing anyway or stop generation, let the user make their edit and generate with the updates directly.

Anonymous
09/01/24(Sun)17:07:05 No.102190650

Anonymous 09/01/24(Sun)17:07:05 No.102190650

File: usefultool.png (2.46 MB, 3840x2160)

2.46 MB PNG

Anonymous
09/01/24(Sun)17:08:41 No.102190672

Anonymous 09/01/24(Sun)17:08:41 No.102190672

https://aclanthology.org/2024.lt4hala-1.15.pdf

gpt-4 works with Latin, to some extent. What others work with Latin?

(crossposted accidentally)

Anonymous
09/01/24(Sun)17:42:14 No.102191116

Anonymous 09/01/24(Sun)17:42:14 No.102191116

boring general

Anonymous
09/01/24(Sun)17:59:12 No.102191372

Anonymous 09/01/24(Sun)17:59:12 No.102191372

After doing some tests, it seems like on my machine Mistral Large gets the fastest with the draft model being Mistral 7B v0.3 at Q4_0, rather than Nemo also at Q4. Now I get 1.33x speed. I put the draft model on my weaker GPU, and then I split the main model between my RAM, my stronger GPU, and my weaker GPU, all of them. It seems that the penalty that comes from splitting to multiple GPUs is outweighed here, at least for token generation. And this is faster than the draft model being on the strong GPU. From this, I believe that on my system, the bottleneck is the main model's speed, then the draft model's speed/accuracy. Perhaps I could get more (proportionate) gains by using a smaller quant of Mistral Large, not sure.

Anonymous
09/01/24(Sun)18:01:15 No.102191403

Anonymous 09/01/24(Sun)18:01:15 No.102191403

>>102179805
I'm failing to understand how the context template and instruct mode prompts for Hermes 3 should look like. Anyone got their settings?

Anonymous
09/01/24(Sun)18:04:00 No.102191441

Anonymous 09/01/24(Sun)18:04:00 No.102191441

>>102191403
It's ChatML, and it should come prepackaged on most frontends.

Anonymous
09/01/24(Sun)18:05:23 No.102191456

Anonymous 09/01/24(Sun)18:05:23 No.102191456

>>102191441
Thanks anon, much appreciated

Anonymous
09/01/24(Sun)18:09:11 No.102191491

Anonymous 09/01/24(Sun)18:09:11 No.102191491

File: d4976636-9862-4105-b64a-d(...).png (241 KB, 512x512)

241 KB PNG

>>102191116

Anonymous
09/01/24(Sun)18:14:52 No.102191557

Anonymous 09/01/24(Sun)18:14:52 No.102191557

>>102191372
Did you left your --draft at 4?

Anonymous
09/01/24(Sun)18:22:11 No.102191654

Anonymous 09/01/24(Sun)18:22:11 No.102191654

>>102191557
Forgot to mention that. I tested 1 through 4 for each model and each configuration (putting it on the strong GPU or weak GPU) and found that 2 was the best, though 3 was almost the same most of the time. I guess it could be different if I was to try a coding prompt, but I don't do much of that so it doesn't matter much to me.

Anonymous
09/01/24(Sun)18:34:48 No.102191825

Anonymous 09/01/24(Sun)18:34:48 No.102191825

That reminds me

>>102184629
>>102175851

I didn't like it. I found it insanely horny to the detriment of everything else. I tested it with a card that depicts a tomboy friend with benefits and every time I tried, despite the opening post setting us up to hang out first, the AI would insist on depicting my character already down to his underwear or something similar so the AI could walk over seductively then fondle me or crawl up to me and sniff my bulge and beg for sex or something equally lewd.

The same card on Rocinante would tackle me into a hug, press their tits against me, say they missed me, ask me how I'm doing, and ask what I'd like to do.

Anonymous
09/01/24(Sun)18:42:20 No.102191898

Anonymous 09/01/24(Sun)18:42:20 No.102191898

>>102191845
For what and where?

Anonymous
09/01/24(Sun)18:50:26 No.102191979

Anonymous 09/01/24(Sun)18:50:26 No.102191979

>>102187357
Can one of you EU anons get a job at Mistral and leak the 123B base model? That would be sick. Thanks in advance.

Anonymous
09/01/24(Sun)18:59:07 No.102192057

Anonymous 09/01/24(Sun)18:59:07 No.102192057

>>102191898
For that and here :)

Anonymous
09/01/24(Sun)19:05:07 No.102192116

Anonymous 09/01/24(Sun)19:05:07 No.102192116

What's the best local model that can write good nsfw verbose flux prompts on a 24 gig card?

Anonymous
09/01/24(Sun)19:05:33 No.102192120

Anonymous 09/01/24(Sun)19:05:33 No.102192120

>>102191979
>123B base model
why? are they actually better?

Anonymous
09/01/24(Sun)19:07:20 No.102192144

Anonymous 09/01/24(Sun)19:07:20 No.102192144

>>102192120
Ignore him. Base models are all but useless.

Anonymous
09/01/24(Sun)19:10:14 No.102192170

Anonymous 09/01/24(Sun)19:10:14 No.102192170

>>102192144
Why do some companies like meta make base models then? Couldn't they just make instruct models like CMDR?

Anonymous
09/01/24(Sun)19:10:30 No.102192176

Anonymous 09/01/24(Sun)19:10:30 No.102192176

what can you do with a locally hosted llm practically speaking?

Anonymous
09/01/24(Sun)19:11:41 No.102192189

Anonymous 09/01/24(Sun)19:11:41 No.102192189

>>102192176
erp

Anonymous
09/01/24(Sun)19:11:55 No.102192196

Anonymous 09/01/24(Sun)19:11:55 No.102192196

>>102192170
All companies make base models. Instruct and chat models are trained on top of base models. Some companies like Meta release both, Cohere only released Instruct versions but a base model exists somewhere.

Anonymous
09/01/24(Sun)19:13:02 No.102192207

Anonymous 09/01/24(Sun)19:13:02 No.102192207

what he said. The first bake is a base model.

Anonymous
09/01/24(Sun)19:15:40 No.102192236

Anonymous 09/01/24(Sun)19:15:40 No.102192236

>>102192189
>erp
practical...

Anonymous
09/01/24(Sun)19:17:32 No.102192255

Anonymous 09/01/24(Sun)19:17:32 No.102192255

Should I use xml style for cards? Like

<{{char}}>
</{{char}}>

Anonymous
09/01/24(Sun)19:18:42 No.102192270

Anonymous 09/01/24(Sun)19:18:42 No.102192270

>>102192255
Yes

Anonymous
09/01/24(Sun)19:19:10 No.102192274

Anonymous 09/01/24(Sun)19:19:10 No.102192274

>>102192255
No

Anonymous
09/01/24(Sun)19:19:18 No.102192278

Anonymous 09/01/24(Sun)19:19:18 No.102192278

>>102192255
depends on the model but usually it's okay

Anonymous
09/01/24(Sun)19:20:23 No.102192289

Anonymous 09/01/24(Sun)19:20:23 No.102192289

Why do even very apparently smart people believe in AI safety shit? I mean the AI becoming evil, not stopping it from saying nigger. There's no proof AI will start to le kill humans

Anonymous
09/01/24(Sun)19:21:06 No.102192297

Anonymous 09/01/24(Sun)19:21:06 No.102192297

>>102192289
The answer you seek is politically incorrect.

Anonymous
09/01/24(Sun)19:21:37 No.102192309

Anonymous 09/01/24(Sun)19:21:37 No.102192309

>>102192270
>>102192274
>>102192278
At least one of you is trying to be funny by messing with others. Because of that, your mom will die in her sleep tonight.

Anonymous
09/01/24(Sun)19:21:53 No.102192312

Anonymous 09/01/24(Sun)19:21:53 No.102192312

>>102192297
What's the answer? There are people who genuinely believe it

Anonymous
09/01/24(Sun)19:25:05 No.102192345

Anonymous 09/01/24(Sun)19:25:05 No.102192345

>>102192309
>Because of that, your mom will die in her sleep tonight.
fucking finally

Anonymous
09/01/24(Sun)19:30:24 No.102192414

Anonymous 09/01/24(Sun)19:30:24 No.102192414

>>102192345
buy an a100 with the inheritance

Anonymous
09/01/24(Sun)19:32:01 No.102192436

Anonymous 09/01/24(Sun)19:32:01 No.102192436

>>102192255
Do this
https://wikia.schneedc.com/bot-creation/trappu/introduction

Anonymous
09/01/24(Sun)19:41:28 No.102192526

Anonymous 09/01/24(Sun)19:41:28 No.102192526

>>102192120
Then we could do our own instruct tunes that aren't slopped.
It's a great model for regular assistant tasks, but heavy fine-tuning (SFT and RLHF/DPO) have crippled its potential for RP.

Anonymous
09/01/24(Sun)19:41:42 No.102192531

Anonymous 09/01/24(Sun)19:41:42 No.102192531

>>102190474
My typical usage pattern is to generate and hide rather than swipe so that I can compare messages on-screen and splice bits that I like together sometimes. So the messages I'm deleting aren't in the prompt.

Anonymous
09/01/24(Sun)19:44:33 No.102192560

Anonymous 09/01/24(Sun)19:44:33 No.102192560

>>102192236
1. if you don't want data to be sent to a megacorp
2. if you don't want to pay said megacorp money
3. if megacorp finally blocks off their app and their API and puts it behind a massive paywall
5. If LLMs in future get cucked to all fuckery then its good to have a backup
6. if the end of the world arrives then at least you have a digital waifu to speak to before you die
> 7. did i mention megacorps

Anonymous
09/01/24(Sun)19:48:43 No.102192595

Anonymous 09/01/24(Sun)19:48:43 No.102192595

>>102192176
migu seggs

Anonymous
09/01/24(Sun)19:49:42 No.102192608

Anonymous 09/01/24(Sun)19:49:42 No.102192608

>>102192236
erp practically

Anonymous
09/01/24(Sun)19:51:27 No.102192634

Anonymous 09/01/24(Sun)19:51:27 No.102192634

>>102192526
>Then we could do our own instruct tunes that aren't slopped.
Nobody does that though. Everyone who finetunes uses "synthetic data" (read: GPTslop) because they can't afford real data

Anonymous
09/01/24(Sun)19:52:49 No.102192657

Anonymous 09/01/24(Sun)19:52:49 No.102192657

>>102192176
Coding
It doesn't matter if you don't know how to code.

Anonymous
09/01/24(Sun)19:54:12 No.102192672

Anonymous 09/01/24(Sun)19:54:12 No.102192672

>>102192656
>>102192656
>>102192656

Anonymous
09/01/24(Sun)19:55:15 No.102192684

Anonymous 09/01/24(Sun)19:55:15 No.102192684

>>102192634
Real data is garbage, you don't know what you're talking about desu

Anonymous
09/01/24(Sun)19:56:35 No.102192705

Anonymous 09/01/24(Sun)19:56:35 No.102192705

>>102192657
If you don't how to code, you won't know when the model is bullshitting you and you'll get stuck when the code output doesn't compile (which happens a lot with llama models)

Anonymous
09/01/24(Sun)19:57:27 No.102192720

Anonymous 09/01/24(Sun)19:57:27 No.102192720

>>102192116
idk what the meta is right now but i'd say c4ai-command-r-08-2024-Q4_0.gguf
or Coomand-R-35B-v1-GGUF but its not really creative

Anonymous
09/01/24(Sun)20:12:38 No.102192907

Anonymous 09/01/24(Sun)20:12:38 No.102192907

>>102192531
>So the messages I'm deleting aren't in the prompt.
Assuming you mean 'hide' (based on your usage description), the message is still in the context. Changing it (hidden or not) invalidates the cache.
What i imagine how you use it is something like this, put to an extreme, of course:
model: Alright. Do we shoot each other or fuck
user: shoot [generate in the background, edit shoot to fuck, generate again, compare results]
ST needs to keep both caches, if you want to continue with one or the other. Once generated, editing the model's reply will add a third version of the cache. ST could just merge them back into one (after the last generation), but any further edit still invalidates the cache.
Do that for a few turns and you have an entire tree of caches ST needs to manage. So what you need is not the ability to 'just cache' the request's context, but a tree of contexts.
I remember i saw a while back something like that. I don't remember if it was a plugin for ST or some other frontend. It showed the llm's reply and yours in a tree, so you could continue and inspect stuff on different branches.
Maybe this
>https://github.com/ironclad/rivet
or
>https://github.com/ianarawjo/ChainForge
has something like it, but it's not quite the on i remember and i don't know if they supports local models
Point is, it's not just 'cache the context'. It's 'cache all these contexts and continue, or not, with one of them, except when i make an edit on a past message, and then generate another cache from it from which i may or may not continue'.

Anonymous
09/01/24(Sun)20:24:04 No.102193022

Anonymous 09/01/24(Sun)20:24:04 No.102193022

>>102188498
Kill yourself schizo

Anonymous
09/01/24(Sun)20:25:05 No.102193035

Anonymous 09/01/24(Sun)20:25:05 No.102193035

>>102188686
Groq soon (TM)

Anonymous
09/01/24(Sun)20:49:16 No.102193217

Anonymous 09/01/24(Sun)20:49:16 No.102193217

>>102192907
It's more like...

USER: Do something!
model: You kill myself (hidden)
model: I kill yourself (hidden)
model: I kill... (generating)

...and what I'd like to do is delete the oldest hidden message while it's working, just to clean up the log.
>the message is still in the context
Hidden messages shouldn't be sent to the model.
I'll take a look at those plugins, though.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.