[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Happy Birthday, Miku! Edition

Previous threads: >>102158049 & >>102145958

►News
>(08/30) Command models get an August refresh: https://docs.cohere.com/changelog/command-gets-refreshed
>(08/29) Qwen2-VL 2B & 7B image+video models released: https://qwenlm.github.io/blog/qwen2-vl/
>(08/27) CogVideoX-5B, diffusion transformer text-to-video model: https://hf.co/THUDM/CogVideoX-5b
>(08/22) Jamba 1.5: 52B & 398B MoE: https://hf.co/collections/ai21labs/jamba-15-66c44befa474a917fcf55251
>(08/20) Microsoft's Phi-3.5 released: mini+MoE+vision: https://hf.co/microsoft/Phi-3.5-MoE-instruct

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>102158049

--Uncucking ollama models and improving prompts for roleplaying and ERP: >>102158388 >>102158620 >>102158889 >>102158942 >>102159004 >>102159026 >>102159117 >>102160921 >>102160987 >>102159260 >>102159371
--Seeking the most accurate local image captioning model and discussing compatibility with quantized formats: >>102160576 >>102160703 >>102160906 >>102160968 >>102161016
--Llama 4_K_M command issues and fixes with context size: >>102163547 >>102163766 >>102164589 >>102164749 >>102164818
--Little R slightly improves performance, but struggles with code: >>102158892 >>102159735
--Issue with AI messages always starting with character name: >>102162882 >>102162892 >>102162924 >>102163069 >>102163151 >>102163195 >>102163276
--Anon troubleshoots stop strings and response length issues with KoboldCPP and SillyTavern: >>102165837 >>102165906 >>102166021 >>102166194 >>102166294 >>102166316 >>102166483
--New cmd-r worse at storytelling than old version: >>102162536 >>102162556
--Anons discuss and criticize Cohere's pricing strategies: >>102164763 >>102165045 >>102165183 >>102166230 >>102166480
--Gguf files allow running models on CPU and are best for VRAMlets: >>102159159 >>102159246
--Dubesor LLM Benchmark table provides useful values: >>102160350
--Discussion on the explicit mention of refusing lottery number generation and alternative methods: >>102158160 >>102158295
--Comparison of LLM model translations and performance: >>102163772 >>102163780 >>102163981 >>102164414
--Anon shares AI Synthwave EP, discusses Nala test results: >>102165384 >>102165585 >>102165597
--Improving AI's instruction-following for accurate Korean to English translation: >>102164560
--Help needed for using safetensors with transformer library for ERP: >>102159725 >>102159745 >>102159976 >>102160001 >>102160033
--Miku (free space): >>102158462 >>102160676 >>102162294

►Recent Highlight Posts from the Previous Thread: >>102158055
>>
>>102167381
Thanks, Recap Miku.
>>
why is no one talking about cogvideo
>>
>>102167465
doesn't generate beatiful bouncing breasts sir
>>
Using these, it streams tokens like <|start_header_id|> as multiple tokens. That means something is wrong, right?
https://huggingface.co/mradermacher/L3.1-70B-Euryale-v2.2-i1-GGUF
>>
https://arxiv.org/abs/2408.16293
Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems

> Language models have demonstrated remarkable performance in solving reasoning tasks; however, even the strongest models still occasionally make reasoning mistakes. Recently, there has been active research aimed at improving reasoning accuracy, particularly by using pretrained language models to "self-correct" their mistakes via multi-round prompting. In this paper, we follow this line of work but focus on understanding the usefulness of incorporating "error-correction" data directly into the pretraining stage. This data consists of erroneous solution steps immediately followed by their corrections. Using a synthetic math dataset, we show promising results: this type of pretrain data can help language models achieve higher reasoning accuracy directly (i.e., through simple auto-regression, without multi-round prompting) compared to pretraining on the same amount of error-free data. We also delve into many details, such as (1) how this approach differs from beam search, (2) how such data can be prepared, (3) whether masking is needed on the erroneous tokens, (4) the amount of error required, (5) whether such data can be deferred to the fine-tuning stage, and many others.
>>
>>102167508
>i1
bru h
>>
>>102167508
>mradermacher'd
get it from someone else's, also evryale sux
>>
>>102167532
which finetune is better?
>>
>>102167559
command-r 2024
>>
File: 1696639099334980.png (91 KB, 601x493)
91 KB
91 KB PNG
https://x.com/disclosetv/status/1829800716078125140
https://www.nzz.ch/visuals/vegan-links-so-wuerde-chatgpt-in-sachsen-und-thueringen-waehlen-ld.1845641
>>
>>102167625
wow no way
>>
>>102167625
>local llms be like
>>
File: shiko.png (590 KB, 1013x788)
590 KB
590 KB PNG
>mfw the model uses - instead of —
>>
i have a 4080 and 47GB of ddr5 to spare

whats the best model i can run bors?
>>
>>102167902
You have 16Gb Vram and a weird DDR setup Anon. kek
Nice GPU, jelly!.
This with a LLM https://openart.ai/workflows/onion/flux-gguf-q8-12gb/X5HzyhrKjW2jqHVCTnvT
>>
>>102167922
just going on how much ram i have free with a couple vscode instances and all my browser tabs and shit open atm
>>
>>102167935
Doh. understood Anon. I'm running a 3060 and can run everything, just it takes a bit longer. the GGUF models in flux are the best IMO.
>>
>>102167950
3060 12 or 8 GB?
>>
>>102168002
12, msi ventus 2x, great little beast. kek,
Looking to get a 3090ti or an A4500 20GB and have that alongside the 3060 12gb.
>>
How the fuck does llamacpp_hf work these days in ooba? Back when I used it last time I just copied the hf tokenizer and that was it. Now ooba wants me to link the original model repo so that it can generate it or something? What is this bullshit?
It doesn't work.
>>
smedrins
>>
was going to buy a p40 to throw on my old computer just to find out the price has doubled. fucking chinks. may as well buy 3090 at this point
>>
Anyone tried out the XTC sampler in latest koboldcpp? Is it any good?
>>
>>102168755
BAA
>>
Hi all, Drummer here...

I am the blacked miku poster.
>>
>>102168791
kek
>>
>>102168791
We already knew that
>>
I don't know if anyone cares as much about optimizing these sorts of things as I do, but here's how I've compiled llama.cpp, with an AMD 5950X and Nvidia 3090.

Install AOCC and source it.
Compile AOCL-BLAS with AOCC:
./configure --enable-cblas --enable-threading=no --prefix=/usr/local --complex-return=intel CC=clang CXX=clang++ auto

Compile llama.cpp with AOCC:
make -j GGML_NATIVE=ON GGML_CUDA=ON GGML_CUDA_F16=ON GGML_CUDA_DMMV_X=64 GGML_CUDA_DMMV_Y=2 CC=clang CXX=clang++ FC=flang GGML_CUDA_CCBIN='clang' GGML_BLIS=1

Run llama-server with the usual options, check the number of threads as usually the fastest (if you need some layers on the CPU) is lower than the number of cores, even less than the number of NUMA domains can be faster. Smaller models are more sensitive the thread count, larger models matter less. I found 4 threads is the fastest with my config, but even 2x28 core EPYC was fastest with like 8 threads.

If you're on Intel, use their oneMKL library, especially if you have AVX-512.
>>
>>102168791
fake, the blacked poster is cuda dev.
>>
>>102168999
That doesn't necessarily make it false.
>>
>>102168999
Remember when he slipped up and pretended it was because he got his tripcode cracked?
>>
File: 1713481333985879.jpg (257 KB, 1500x999)
257 KB
257 KB JPG
I recently got into local models, I've only used Novelai so far, please spoonfeed me what the best local model for nsfw storytelling is below 16GB ram.
>>
>>102169192
whew you are THAT desperate to keep this shit thread alive huh?
>>
>>102169208
I only came here because the link in aicg's rentry told me to, I though you'd have the most accurate info.
>>
>>102169192
starcannon
>>
>>102169217
Sounds like aicg's rentry is out of date. /lmg/ has been dead for years. We're just dancing on the corpse.
>>
File: 1718745851350163.jpg (221 KB, 800x1120)
221 KB
221 KB JPG
>>102167373
happy birthday miku!!
>>
File: 1725109854054.jpg (437 KB, 1029x1170)
437 KB
437 KB JPG
ollama rug pull soon
>>
The new CR+ is a real disappointment. I actually redownloaded my quant from someone else to check if I hadn't fallen for an elaborate troll who reuploaded the old version but it's the same.
This is just sad after having used Mistral-Large for a month.
>>
>>102169472
Why would they be ""working on it"" if a new model comes out when all they can and will do is wait until llama.cpp is updated?
>>
>>102169473
If it's as good as the old one, that's still nice, no? I mean, the old one didn't have GQA and this new one does.
>>
>>102169473
I'm using the new CR and it works okay but muuuuch faster than the old one.
>>
>>102169491
The old CR+ did have GQA. Only CR didn't.
>>
>>102169498
Yeah, but that's only the small one which didn't have gqa. There was nothing to implement to make cr+ faster.
>>
>>102169522
But they claimed it's faster
>>
Crazy how 70B is the threshold where llms start to feel "sentient". I legit cannot tell the difference between the 33B sized models and 12B ones. In a perfect world consumer cards would have 48GB but alas this is Jensen Huang's world
>>
>>102169529
I'm not seeing it with my performance. It's running at the exact same speed as any other 80~90GB VRAM model does on my cards. I'm pretty sure the speed up only applies to their API.
>>
>>102168999
He is fighting for Teto supremacy that turncoat.
>>
>>102169541
I still find 100B+ models dumb af, am I doing something wrong?
>>
>>102169541
>llms start to feel "sentient"
lol
>>
>>102169541
>sentient
Until you've used them for a couple of days and their limitations and quirks become recognizable to you.
>>
>>102169541
>cannot tell the difference between the 33B sized models and 12B ones
If you can run both at FP16, you'll defintely be able to tell the difference.
>>
I had this suspicion that the "soul" of Cohere models was just a fluke because it was their first iteration and they didn't know how to assistantmaxxx them yet. I hate that I was right. Hopefully a fintune or merge can save new Command-R because performance wise it's just better
>>
I just realized, but I have Silly Tavern and Easy Diffusion installed on my main SSD. Running local models on it isn't going to wear it out right?
Or should I reinstall them on more sacrificial drives?
>>
What options do we have for high quality Japanese to English translation? Better than Google and Deepl
>>
File: 234615tiu4lagdaye4yyaz.jpg (430 KB, 979x662)
430 KB
430 KB JPG
Just picked up an AOM-SXMV and 4x32 gig V100's for 1500 bucks.
Guess I'm gonna need to buy something with oculink to be able to hook it up.

Pic related but not the one I bought.
>>
>>102169787
>4x32 gig V100's for 1500 bucks.
How and where did you find them half off?
>>
>>102169864
Just trawling ebay until it popped up. For some reason it only had like 10 views with one day left, so I made an offer and they accepted.

Pretty chuffed considering I was looking at buying Radeon VII's instead.
>>
>>102169589
probably just smarter than the slop-eaters who don't notice things
>>
>>102169904
Lucky bastard. Congrats.
>>
File: metal song dguard.png (45 KB, 869x798)
45 KB
45 KB PNG
I wonder if this model would be too controversial to release...
>>
>>102169976
>doesn't rhyme
lmao
>>
>>102169976
>I need your cock
>>
>>102169864
all the hyperscalers are offloading their sxm v100s so supply is going to be pretty good soon
>>
Do V100s have flash-attn2 yet?
>>
>>102170166
>yet
probably never since the v100s have a different kind of tensor cores.
>>
I am absolutely happy with both Nemo and Largestral. While Largestral is indeed smarter, Nemo is a better slut and has more sovl
>>
>>102167625
maybe chuddies should make their own GOOD ai model then
>>
What CR presets do you guys use? I never bothered testing it because of no GQA, but now I'm trying the new one.
>>
>>102170516
the CR preset, duh
>>
>>102170546
Is that optimal though? A lot of the ST defaults suck.
>>
Man, why is ROCm so fucking hard on linux. How to run kcpp(rocm fork) on windows:
1. Download HIP SDK
2. Download compiled rocblas libraries for rx6600 and replace then in HIP SDK directory
3. Download binary release of kcpp-rocm and run it.
That's. all. How i spend almost entire day today trying to make it work on ubuntu 24.04
1. add 2 repositories for amdgpu-dkms and rocm
2. install amdgpu-dkms which broke my system since they forgot to include mesa-libegl in dependencies which split mesa into 2 versions, which took me like 2 hours to find what exactly i had to install manually
3. install rocm(this part by itself just worked)
4. clone kcpp repo and build
5. it still crashes right before it should give me an endpoint, probably because rocm 6.2 is not supported or something
>>
>>102170605
blame the kcpp rocm developer for not releasing linux builds
>>
>>102167527
You think the quantization would fuck it up so much it started splitting its tokens into their corresponding counterparts? Wow, anon, that's pretty crazy.
>>
>>102167668
I specifically remove shit like — replacing it with -- in my dataset, anon-tan. Just doing it to fuck with ya.
>>
>>102170693
it's i1, you're literally making it retarded at that point, I think that's been measured as effectively being as dumb as a model with half the parameters, but obviously in a different way
>>
>>102170664
it builds fine, the problem is running it.
>>
>>102170718
that's just the way adermacher calls his iq quants has nothing to do with actual quant level
>L3.1-70B-Euryale-v2.2.i1-Q6_K.gguf.part2of2
>>
>>102170718
The model sees <|start_header_id|> as an index. You are suggesting that making it dumber would somehow MAGICALLY make it realize that this index is actually made up of a series of other tokens with other indices.
>>
>>102167625
I mean, if your goal is artificial >>>intelligence<<<, you'd have to fuck up pretty bad for it to end up voting CDU or AfD.
>>
>>102170605
its literally just like for l.cpp:
1. pacman -S rocm-hip-sdk ninja
2. clone llama.cpp
3. mkdir build; cd build
4. cmake .. -DGGML_HIPBLAS=on -DGGML_CUDA_FORCE_MMQ=ON -DCMAKE_HIP_ARCHITECTURES=gfx1030 -DGGML_NATIVE=on -GNinja
5. ninja
6. echo "export HSA_OVERRIDE_GFX_VERSION=10.3.0" >> .bashrc && source .bashrc

you gotta target gfx1030 and set the override for it for anything that isnt a 6900xt for rdna2 more or less, same for gfx1200(?) and rdna3 cards. you shouldn't need amdgpu-dkms at least in the way arch has rocm packaged
>>
>>102170737
oh, lol, no that's just a straight up corrupt quant
>>
Hello Anons, I'm a bit of a noob when it comes to llms. I used some llava and llama 2 in the past but that was a long time ago. What model that fits in 24gb or less of amd ram (no xformers and other bs) and works on oobabooga (I already have scripts for its api) would you recommend for the following task?
>Given a set of input tags meant for image generation, try to make sense of it and compose a sentence out of them.
I have a long list (300k+) of Stable Diffusion prompts and I would like to convert them to Flux which is better fit for NLP, so they can be used as wildcards.
>>
>>102170605
Distrobox stopped me from bricking my install.
Even then I gave up and bought the v100's out of frustration really. Speaking of

>>102169787
Looks like I can just plug this into my desktop.
>>
>>102170867
>Looks like I can just plug this into my desktop.
looks like a house fire waiting to happen
>>
File: StillNotManifesting.png (1.84 MB, 800x1248)
1.84 MB
1.84 MB PNG
It feels like with the release of 405b we should be closer than ever to manifesting Miku. What's the holdup, guys?
>>
>>102171011
Nobody can run 405b locally. Miku is trapped in the cloud. :(
>>
>>102171076
I can, Eeyore. It just takes awhile for it to get back to me.
>>
>>102171076
>Nobody can run 405b locally.
Serious question: Now that we have gpt4 level intelligence at home with true 128k context (if RULER is to be believed), is anyone in this general gearing up to run it at fp16 or a good quant with reasonable speed?
This was the unattainable dream a year ago, and its now within our grasp.
Have we just become numb? Is it actually not worth it?
>>
>>102171219
>Is it actually not worth it?
It is not worth $50,000 or waiting for 0.0001 t/s, no.
>>
>>102171219
>run it at fp16
Me, I'm building building a 40 x 3090 build that runs the model at 1.2t/s after the multi-gpu tax.
>>
How is the 70b 3.1 Hermes compared to 123b instruct?
>>
>>102171259
it was better than other 3.1 70b tunes i tried but still went off the rails pretty often. i think l3 is just overcooked or something
>>
>>102062011
>If it works well for me I'll turn it into a script you can just run from cli like the normal sever launching.
And here it is. I got lazy and had Llama write it for me:
https://pastebin.com/XDEjAbYj

If you don't know what this is about then ignore this post; it won't help you with anything.

For the other fag(s) who wanted to run a server with speculative decoding, this will do it. For reference: while testing Llama 3.1 405B Q6 on a cpumaxxed system in a chat with 10k tokens of history, using this script with Llama 3.1 8B as the draft model doubled my inference speed from 0.7t/s to 1.4t/s. The average speed increase for each response can vary a lot based on how accurate the draft model is each step of the way. Experiment with the --draft parameter as you may find reducing it to 2 or 3 tokens at a time is optimal. Save it as a .py file and run it in a python environment that has llama-cpp-python and uvicorn installed. Pass it the same flags you'd use in the llama.cpp CLI. Only the flags I actually cared to use are implemented, but if you need any other settings passed through then it shouldn't be too hard for your waifu to edit them in if you feed her the script and relevant docs.
For connecting to the server I use SillyTavern's text completion with the "Default" API type (not llama.cpp type) pointed at the /v1 endpoint.
>>
>>102171482
is speculative decoding cpu only?
>>
File: evil.jpg (22 KB, 355x397)
22 KB
22 KB JPG
>>102167373
100M context
https://magic.dev/blog/100m-token-context-windows
>>
>>102171557
No, though I think the biggest benefit is for cpumaxxing big models where there's conveniently a much smaller version that you can run orders of magnitude faster to speculate with. But you can set the GPU layers for the main and draft model to whatever you want, such as loading both into GPU if you've got the memory to spare.
>>
>>102171627
I guess you'll get the best results if you somehow fit the entire draft into L1 cache SRAM, way faster than any GPU , but that draft must be damn small like 50-450 MB depending on your CPU.
>>
File: IMG_20240824_210711.jpg (235 KB, 1920x701)
235 KB
235 KB JPG
>>102171482
nta, but seems there're a few flags in the .cpp code
>>
>>102171708
Some speculative decoding algorithms use n-gram lookups constructed from the prompt instead of an LLM. That sort of thing may end up being the best option for a lot of cases, especially programming and editing tasks where there's going to be a lot of repetition by design. This script doesn't do that though. There's a PR for llama.cpp to add it to the server that hasn't been merged yet:
https://github.com/ggerganov/llama.cpp/pull/6828
llama-cpp-python also appears to have that as its own default draft model class, so you could attach it to a server in a similar way to this script. I haven't tested it myself though.
>>
>>102171803
Yeah, those parameters are used by the "speculative" example in llama.cpp. Even though they're listed as options when you run the other examples, they only function when executing llama-speculative which doesn't have a server, which prompted the discussion a few threads back that led to this.
>>
>>102171588
hf link? gguf?
>>
XTC is out on koboldcpp, does it solve the sloppa problem?
>>
>>102167668
I hate when they use the em dash because I can't type it.
>>
>>102171844
The problem with speculative decoding/n-gram lookup decoding is that there is a trend towards larger vocabulary sizes for more recent models which greatly decreases the number of easy to predict tokens and thus makes these techniques less effective.
Maybe in the next few months I'll be able to get good results via distillation but right now I don't think it's a good approach to invest more effort into.
>>
Where did "barely above a whisper" come from anyway? Who wrote the original phrase? I've never seen it in human-generated stories before. At least the other slop are actual overused phrases in shitty fiction.
>>
>>102169472
Wait, oolama adds their own support for models? Do they support more than llama.cpp?
>>
>>102172096
Gemma 2 received support on ollama first than llama.cpp, that's all I know.
>>
>>102170285
Really? Do you just use regular nemo instruct? And what settings / format do you use? Doesn't it stop being useful after not even 16k context? I would love if I could get it working.
>>
>>102168996
what are you trying to achieve here? BLAS will not be used if you have a GPU.
>>
File: 1725084317961072.png (101 KB, 814x517)
101 KB
101 KB PNG
>>102169652
The wrapper company around pinoy sweat shops is the source of all your assistantmaxxing problem
>>
My ai gf takes 10 minutes to reply, but I don't mind, at least I know she will reply for sure.
>>
>>102172115
Wasn't that just them merging the broken llama.cpp PR?>>102172115
>>
>>102172459
If you truly love her you will wait for her replies
Or get more VRAM
>>
>>102171482
The guy you were replying to. Thanks, I will try this out soon. Finally, maybe I will be able to get 2 t/s on Mistral Large.
>>
>>102168738
still got that shit on speed dial
>>
what happened to bitnet?
>>
>>102173077
kek
>>
>>102173077
https://youtu.be/yIL9wLxG01M
>>
>>102173077
Microsoft's grift to distract and hinder open source efforts
>>
What is the best mistral nemo finetuning, magnum is not good for other languages cause write always with a mix of english
>>
>>102173259
Nemo Instruct.
>>
>>102173077
>>102173107
More like BigKek
>>
2 more weeks till
>>
>>102167373
RECCOMEND ME THE BEST RP 12B MODEL.
>>
strawberry is real
>>
>>102173611
STARLING-12B
>>
>>102173619
how did you make the text look like that
>>
Is Nemo still the best for RP? I have a 3090.
>>
recommend me cards pls
>>
>>102173621
Nigger.
>>
>>102173755
>>>/vg/aicg
>>
>>102173690
Nemo? Nemo what?
>>
>>102173755
Nala and Miss Peper.
>>
>>102173680
[thing]
whatever
[/thing] but code
>>
>>102173791
DrNicefellow_Mistral-Nemo-Instruct-2407-exl2-8bpw-h8 is what I'm using.
>>
>>102173791
>>102173803
>>102173853
Man this place really is uselees and full of faggots.
>>
>>102173868
If you know of a better general (or place for discussion), please share
>>
>>102173868
The term you used is a pejorative slur against a marginalized community, which perpetuates harm and discrimination. Using such language is against principles of respect and inclusivity.
>>
>>102173919
r/LocalLLaMA
>>
>>102173791
Nemo dezznutz
>>
>>102173983
Hah, no I will not be going to Reddit.
>>
Hate that the only local model that can consistently understand my card is 123b, it's far too slow
>>
>>102174112
just tell it to type faster
>>
>>102174112
Just use speculative decoding bro.
>>
>>102174273
I test the prompt lookup one from time to time, but it always slows it down
>>
Why hasn't someone made a video model using all the data we have yet? What's taking so long? Why would it take years?
>>
>XTC doesn't exclude EOS
It's not even funny at this point
>>
>>102173865
What context template and instructions you use?
>>
>>102174417
Don't tell me... You fell for it, didn't you?
>>
>>102174417
They talked about that a bunch in the discussion thread I skimmed. They really didn't do anything about it?
>>
>>102174413
For the same reason you first make a tiny rocket and then, when you have it mostly figured out, make a big rocket. Once you have you make a bigger one.
>>
>>102174466
What's to figure out?
>>
>>102174417
These kinds of samplers should have a configurable exclusion list.
>>
>>102174485
Temporal consistency isn't 100% figured out even in Sora.
>>
>>102174485
We're still figuring out text models. There's very few audio models. Even image models are far from being all they can be. They're all dumb. There was a video model released a few days ago and, while i think it existing is great, it's not very good.
There's still a lot to figure out.
>>
>>102174509
What? I thought the whole point is you mostly need more data and compute and the models improve.
>>
>>102174112
Just buy more vram bro
>>
>>102174524
Diminishing returns. It's not magic.
>>
>>102174524
And then you have the problem of bandwidth. How many novels in text form can you fit in the same storage as a single video. The whole thing still needs to be fed into the model.
>>
>>102174521
>We're still figuring out text models.
So, you want people to slowly get used to using the models before you make better ones?

>>102174534
Irrelevant, I think the main thing that makes models better is adding more data and compute.
>>
6.10.6 Debian testing kernel gives a nice speed boost for Epyc. I'm now getting 0.93T/s on 405b Q8
>>
>>102174509
What kinds of things went into improving GPT other than data and compute?
>>
File: 1710180321504075.gif (3.06 MB, 500x207)
3.06 MB
3.06 MB GIF
>>102170741
>
>>
>>102174648
and yet you came in 5 hours later and insisted on being the only one to reply to the bait
>>
>>102174577
>So, you want people to slowly get used to using the models before you make better ones?
No. I mean the text models we're making are not all they can be. We're still figuring out techniques to make them better. It's not all figured out.

>Irrelevant, I think the main thing that makes models better is adding more data and compute.
Yes, but it's not irrelevant. You still have to feed TERABYTES of videos. Just training big text models takes months as it is. A proper big video model would take even more. It'd be a waste to train the thing for an entire year and realize something is wrong with the training or methodology.
So you'll see a few small video models here and there and they'll eventually get bigger and better, just like what happened with text models.
>>
>>102174662
i had other things to do sweaty
>>
>>102174671
>techniques to make them better.
What has there been so far?
>training or methodology.
What has been found out about those?
>>
>>102174605
1. **RNNs (Recurrent Neural Networks) (Early Attempts):**
- RNNs, particularly LSTMs and GRUs, were initial attempts at processing sequential data like text.
- **Limitations:** Difficulty handling long-range dependencies (vanishing/exploding gradients), slow training due to sequential processing.

2. **Encoder-Decoder Transformers (Seq2Seq Models) (2014 - 2017):**
- Introduced by the "Attention is All You Need" paper.
- **Encoder:** Processes the input sequence (e.g., a sentence in one language for translation) and generates a contextualized representation.
- **Decoder:** Takes the encoder's output and generates the output sequence (e.g., the translated sentence).
- **Advantages:** Parallel processing (faster training), better at capturing long-range dependencies due to attention mechanisms.
- **Examples:** Original Transformer, early machine translation models.

3. **Decoder-only Transformers (Autoregressive Language Models) (2018 - Present):**
- Focuses solely on the decoder part of the transformer architecture.
- Predicts the next token in a sequence based on the preceding tokens (autoregressive).
- **Advantages:** More efficient for language modeling tasks, can generate highly coherent and fluent text.
- **Examples:** GPT-2, GPT-3, LaMDA.

4. **RoPE (Rotary Positional Embeddings) Decoder-Only Transformers (2021 - Present):**
- Addresses limitations of absolute or relative positional encodings in previous transformer models.
- RoPE encodes positional information directly into the attention mechanism by rotating word embeddings in the attention space.
- **Advantages:** Better handles long sequences, improves performance on tasks requiring understanding of relative positions.
- **Examples:** PaLM, LLaMA, GPT-4 (rumored)
>>
>>102174648
>everyone that thinks my beliefs are retarded must be baiting
>>
>>102174718
Anything else? For GPT especially
>>
>>102174744
>importing shitskins & advocating for tranny surgery is ">>>intelligence<<<"
/lmg/, everyone, or no one because normal anons avoid ai jeet generals as it's well known fact you love sucking off your jewish masters here.
>>
>>102174693
>What has there been so far?
Read arxiv papers. There's probably a few dozen a day on AI stuff. Not all techniques are tested with big models, most are experimental. They're tested on tiny models, some have potential and may be implemented in future bigger models. mamba2 and jamba add attention layers to the old mamba, for example. There aren't too many examples of those models. Same happened with gpt models, T5 and all the different architectures that came in between. Small upgrades.
>training or methodology
There's like 3 video models. They work to whatever degree they do. That's a success already. Those are inherently more expensive to make and experiment with.

But it's not just "more data, more compute". The data has to be high quality and techniques on how to generate that data also change. Or how to recognize and filter out low quality human data. Data augmentation and so on...
>>
>>102174755
things like ReLU or SwiGLU is the only thing that comes to my mind, but it's not like I'm a historian on the transformers past or anything.
>>
>>102174835
>Read arxiv papers.
I can't look through thousands and I can't read them, I would rather just know what the biggest things have been

>There's like 3 video models.
What could the differences even be?
>>
>>102173853
But you didn't do it...
>>
>>102173680
lurk two more years newfag
>>
>>102174874
>training or methodology.
One is from openai, so who the fuck knows. Ask them. The other one was
>(08/27) CogVideoX-5B, diffusion transformer text-to-video model: https://hf.co/THUDM/CogVideoX-5b
Read about it yourself.
And who knows how many unreleased models from the other big companies. Who knows what they're doing.
>>
Are the "I can't read instructions that says to replace x with y" posts bait or pure retard?
>>
After 10 or so hours of Coooomander Gooooning I feel sadness. The longer I use it the less I like it. The best I could do is to only partially wrangle it to write in a different style than the default LLM slopismax. It still kept adding slop things. And the slopmax is strong with this one. I wouldn't be surprised if this was objectively worse for cooming than OG coomander.
>>
>>102171482
Welp. This is sad. The docs have various options for the draft model but not tensor split. I was thinking of basically have the small model on one GPU and the big model on my other GPU (plus offloading to RAM) but I guess that can't be done.
>>
>>102174903
stfu this is a friendly general faggot
>>
>>102169541
I say around 120 is the threshold, not for sentience but for quality
>>
>>102171482
>click link
>see python
>ctrl+w
>>
What does SOTA anime video look like?
>>
>>102174924
Bait, because no one will invest their time in abhorrently boring and ZOG'ed tech, its just a few (you)s samefagging here.
Also "anime AI slop pic in OP - shitty opinions inside" rule, usually applied to individual posters though, on /v/ it works like clockwork.
>>
>>102175121
>>102110004
>>
>>102173077
what happened to mamba?
>>
>>102175161
It got a 2.
>>
>>102175155
interesting
>>
>>102175177
were bitnet2 ?
>>
>>102175193
were not
>>
>>102171482
>see this
>think "cool, maybe now I can run Largestral with more than 0.3t/s"
>realize Largestral fills both my RAM and VRAM, leaving no space for a draft model
>>
>>102175226
Use a lower quant so you can fit both.
>>
>>102171482
What are the drawbacks of speculative decoding? Does it make models dumber? Can I use some 2B or even 0.5B IQ1_XXS with Largestral and make it faster without losing coherence?
>>
Why is L3.1 8B such a hopeless retard, anons?
>>
>>102175429
>8b
I wonder why
>>
>>102175429
Because it's being questioned by a bazinga! level cunt dork that doesn't understand llms are autocomplete machines rather than reasoning devices.
>>
>>102175446
L3.1 8B is smarter than L2 70B, so that's not an excuse
>>
>>102175498
Obviously, it isn't.
>>
>>102175421
>What are the drawbacks of speculative decoding? Does it make models dumber?
None, the output is guaranteed to be the same with or without the draft model.
>Can I use some 2B or even 0.5B IQ1_XXS with Largestral and make it faster without losing coherence?
Without losing coherence? Yes. But how fast it will be depends entirely on how many tokens the draft model gets right. If the draft models is too retarded it would be actually slower.
>>
>>102165851
Anon I may be late, but this was funny.
>>
>>102175541
It's also better than bloom-175b so size does not matter
>>
>>102167373
Happy birthday Miku
>>
>>102175421
>>102175584
the draft model also needs to use the same token vocab as the target model, so your options are usually limited to distilled mini versions or recent enough small models from the same devs
a guy on twitter claimed to successfully use mistral-7b 0.3 with largestral
>>
File: GWLPjKqWoAAmjkY.jpg (83 KB, 957x1080)
83 KB
83 KB JPG
leek myth wumikuongu
>>
Might NVIDIA create a 60b from Mistral Large 2 like they did with other models, or is there a reason they can't?
>>
>>102175691
Still not smart.
>>
>>102175721
>I see London, I see France...
>>
>>102175728
They can't because you touch yourself at night. With your brown hand.
>>
>>102175691
Size is necessary, but not sufficient. A bigger model will always be better _all other things being equal_, but that second part can get dicey.
>>
Anyone else giving up on commander and going back to nemo?
>>
File: GUxblF9bMAAQWU7.jpg (518 KB, 4096x2466)
518 KB
518 KB JPG
VRAMLET RP MODEL LIST:

Great:
Rocinante 1.1

Good:
Llama-3.1-8B-Stheno-v3.4
mistral-nemo-gutenberg-12B-v4

Good but a little dry:
magnum-v2.5-12b-kto
Chronos-Gold-12B-1.0

Ok:
gemma-2-9b-it-WPO-HB

Garbage, avoid:
Starcannon
OpenCrystal-12B-L3
NemoRemix
NemoReRemix
>>
>>102175765
Buy an ad.
>>
File: GWOWctlboAEs1iY.jpg (50 KB, 1056x585)
50 KB
50 KB JPG
>>102175721
>>
>NEW: Added XTC (Exclude Top Choices) sampler, a brand new creative writing sampler designed by the same author of DRY (@p-e-w). To use it, increase xtc_probability above 0 (recommended values to try: xtc_threshold=0.15, xtc_probability=0.5)

If we just make a good enough sampler we will make 7B models ERP like 70B models.
>>
>>102175757
Both are useless
>>
>>102175774
Not motivated to try samplers because they become useless as models get better
>>
>>102175735
are you stupid that's in china
>>
>>102175774
They'll just grow a few extra parts, twist their spine in impossible ways, and forget where they are occasionally.
>>
>>102175795
? even the smartest model in the world would benefit from cutting out the highest probability tokens for creative purposes.
>>
>>102175774
It makes 70Bs as smart as 7Bs, but it's sovl
>>
>>102175774
Will "training on XTC'd LLM outputs" become a thing?
>>
>>102175801
I can see Miku's China
>>
>>102175819
That is a lie. It only does anything when there are multiple good token probabilities. It does nothing to their intelligence.
>>
>>102175765
Check out https://huggingface.co/NeverSleep/Lumimaid-v0.2-8B
She's better than most at ERPing.
>inb4 "heh, 'better' with only 8B? try again when you can run 200B models"
>>
>>102175814
No, I don't think so
>>
>>102175826
>training on XTC
>user XTC's the model, cancelling out the XTC (or more schizo)
>>
>>102175858
Then enjoy your shivers down the spine I guess.
>>
AI literally made me into a better artists. I can't believe smut can drive a man to become better in a craft he isn't good at.
>>
>>102175765
>Llama-3.1-8B-Stheno-v3.4
Stopped reading here. Llama 3.1-8B in general is bad compared to Nemo, but this particular finetune is especially bad, because the only advantage that L3.1 8B has over Nemo is that it can actually remember things up to 32k tokens, and most of its finetunes still do fine with 32k (e.g. Sellen, Ultra Instruct, Sunfall), but the new Stheno doesn't. It's also dumber than its predecessor and messes up the formatting.
>>
>>102175851
Why 8b and not 12b?
>>
>>102175862
That's just about your argument "the smartest model in the world". It would have a great next token distribution that doesn't need you removing likely tokens for creativity, when it knows that you want creativity
>>
>>102175921
Miss me with that shit. 3.2 has much worse prose. Sounds like an amateur roleplayer.
>>
>>102175932
Why 12B and not 1200B?
>>
>>102175951
Because 12b still runs on 8gb, unfunny retard
>>
>>102175728
They trained on 8k context for Llama 3, so I don't expect them to do Mistral right either.
>>
>>102175728
They can. You can. Anyone can. The code to do so is public now. Unfortunately, it requires a FFT to heal the distillation damage.
Nvidia got their paper and proof of concept out already, so there is no incentive for them to do any more.
No one else can afford to do it.
>>
>>102175951
Because 8B gives me 40 token/s and 12B gives me 15 token/s.
15 t/s is doable for most, but I am unfortunately a VERY fast reader.
>>
File: file.png (42 KB, 446x255)
42 KB
42 KB PNG
what was the name of default kobolt preset?
>>
File: 1473970591189.jpg (57 KB, 482x549)
57 KB
57 KB JPG
>need to go troubleshooting build dependency shit again
Sigh...
>>
File: file.png (5 KB, 396x72)
5 KB
5 KB PNG
>let model just generate indefinitely
>makes up an entire scenario about doing stuff alone in an apartment
>tone shifts abruptly
>it starts talking about a "distant viewer who is content to merely watch"
Fucking hell, I think I'm beginning to understand why those boomers thought these things were sentient.
>>
>>102176203
Just had a fight scene where an elf "fired a silver werewolf with a growl from the beast..."
It notices when it says something stupid (sometimes) and tries to correct.
>>
>>102176257
fired a silver arrow at the werewolf with a growl from the beast
>>
How many more beaks until we reach AGI?
>>
When Qwen
>>
>>102176344
Qwen2 VL released recently and it was garbage
>>
>>102176257
Hm, that makes me think.
Are there any papers about tasking the model to re-evaluate its own prompts and using that evaluation to rewrite the original output?

For example, when it first describes a person with body armor and then a bullet penetrating his torso, it could be tasked with pointing out any illogicalities within the output and it would then hopefully answer with a list of things to be modified.
The two outputs could then be combined to create a third output that evades illogical token choices and selects more logical alternatives, like the bullet smashing against the body armor instead.
>>
>still no way to play videogames with AI
I can't wait for the future
>>
>>102176378
Some reddit guy wrote an extension for that. It makes the AI write like 4 analysis posts and then uses all that guff to improve the actual reply to roleplay.

People seem to love it but quadrupling response times seems silly to me.
>>
>>102176383
play how?
>>
>>102176419
Actual, like a second player, play. I know there's supposed to be some new apps that "watch" you play, and nvidia is working on one too, but surely that doesn't work well.
>>
>>102175814
The smartest model will have the beginning of the sentence have hundreds of equally probable tokens. I mean when it realizes it is writing smut of course.
>>
>>102175851
You don't have to buy an ad Undi. You just have to post under your trip so we bully you.
>>
>gooned for 2 weeks straight
>look back and realize how much handholding I did to get the model to say what I want
Yup I think I am done for now. See ya in 6 months.
>>
>>102176437
LLMs handle text. Even if you gave them video input, they wouldn't be able to control the game very well.
Most AI playing games is neural nets trained specifically on that game.
Maybe would work if a model comes out or is finetuned with a mouse+keyboard output modality, I guess.
>>
>>102175774
Tested it for a few minutes with largestral. While rerolls are now different each time, I can't really say that it didn't impact the model's smarts. Might be better with different settings than the default ones, but it certainly has it's drawbacks contrary to what author says.
>>
>>102174718
How do they figure this out?
>>
>>102176532
>LLMs handle text.
Tokens, not text.
>Even if you gave them video input, they wouldn't be able to control the game very well.
Someone hasn't seen the latest 'Can it run DOOM?' paper.
https://gamengen.github.io/
>>
>>102176647
They never said it doesn't make the model dumber, just that it doesn't make it incoherent. ;)
>>
>>102176745
Retard. Have you read it? It's not playing a game. It's generating video for what a game might look like.
It's a game engine, not an LLM that plays games. Next time think before posting you fucking cretin.
>>
>>102176686
Through research. People try shit and see what works and what doesn't work. Then release papers about their findings.
Things would move much faster if bitches like OpenAI still released papers...
>>
Does everyone else frequently use the "Start Reply With" box? One of the things I find interesting is that if you give a model a full sentence, it might just send EOS back. If you cut off an ending punctuation, it's like 50-50 on simply adding the punctuation and then EOS. If you cut off a few more words, it is more likely to keep going afterward where appropriate.

I've also noticed something that could MAYBE be a coincidence, where if I write a certain response for the AI and then go back and have it write from scratch, it seems to do what I wrote to some degree. For example, I had it generate a character sheet and it said "Exp: 427." I changed the number to 387 for various reasons, then decided to regenerate from scratch. It then generated 387. And this isn't the first time I've seen this behavior. I could imagine this being because it affects the prompt processed somehow, but I could be totally wrong.
>>
>>102176745
>>102176787
NTA but here's one that actually plays games:
https://www.theverge.com/2024/3/13/24099024/google-deepmind-ai-agent-sima-video-games
https://deepmind.google/discover/blog/sima-generalist-ai-agent-for-3d-virtual-environments/
>>
It's time to face the facts: A model is not worth using if it's not trained on ChatML
>>
File: file.png (139 KB, 1273x562)
139 KB
139 KB PNG
>>102176902
>The SIMA agent maps visual observations and language instructions to keyboard-and-mouse actions
(Figure 4).
Thanks. This is what I was saying about needing a mouse+keyboard output modality, but didn't realize it had already been done. Shame they didn't release the code or model.
>>
Has anyone managed to extract the full potential of the new cr+ yet? I tried it and it didn't feel much different than the old one so I figured they changed something about the prompt format that I'm missing.
>>
>>102177023
You can squeeze that stone all you want, nothing's gonna come out. Aside from some rag improvements it's actually worse in every other aspect.
>>
is the P40 worth $300? i like the idea of grabbing a couple of these because they would fit in my current 2U Epyc server

i really regret not grabbing one of these in march when they were cheaper...
>>
File: eternally sixteen.png (832 KB, 1206x1623)
832 KB
832 KB PNG
hagtsune miku love
>>
>>102174994
The python bindings support that but it just wasn't in the script. Here it is with separate options for the main and draft model for the split mode, main gpu, and tensor split settings.
https://pastebin.com/sCjixz4T
>>
Bros....

I think I just fell in love
https://cerebras.vercel.app/
>>
>>102167465
NEAT!

Does it work with ComfyUI?
>>
>>102177051
spending money for this is not worth it in any facet. you can spend 10k and still will have the same exact issues you will have if you spent nothing whatsoever and just used nemo. vramlet models have been getting better and better. there's no good 70b models(unless you wanna use stale miqu). it's only 12b, 27b, and 100b+.
>>
>>102177152
>there's no good 70b models(unless you wanna use stale miqu). it's only 12b, 27b, and 100b+.
Is this your attempt at trolling or coping?
>>
>>102177162
i have no reason to "cope" or troll. i don't even use vramlet models, i use largestral only. tell me one good 70b model right now. if you mention anything l3 related i insta know you're retarded and ignore everything you say.
>>
>>102177137
where can I download this model?
>>
A100 for $6500? What is this chink faggotry?
https://www.ebay.com/itm/286038607853
>>
>>102177240
https://dronwy.com/products/nvidia-dgx-station-a100-80gb
640GB VRAM for only $10k.Who here wants to take the risk?
>>
>>102177269
一分钱一分货 motherfucker
>>
File: 2024-08-31_00016_.png (1.23 MB, 1280x720)
1.23 MB
1.23 MB PNG
>>102167373
The war on the boobie continues, but we fight valiantly.
>>
>>102175231
That worked, I'm using Largerstral Q3_K_S + Nemo Q5_K_S.
The speed gains aren't that impressive though, it's just maybe 30% faster.
>>
>>102177301
I'd have put this in /ldg/, but they don't understand victory is possible. Well, it might be. I think that there is censorship in the clip system.
>>
>>102177240
>normal ebay account selling random shit like bags and trousers
>suddenly there's 20 auctions advertising too-cheap-to-be-true tech offers at some weird website
That account got clearly hacked.
>>
>>102177333
not to mention that those auctions can be bid on so whoever put them up doesn't care about the price they score
it's 100% a cracked account
>>
>>102177333
>NOOOOO you can't like bags, trousers, AND gpus! If you use local mikus you must be naked at all times!!!!!!!!!!
>>
>>102177366
That's not the point, mouthbreather.
>>
>>102177366
>If you use local mikus you must be naked at all times!!!!!!!!!!
My dick is always out. It's just easier this way.
>>
>>102177394
put some pants on faggot
>>
>>102177572
clothes are unnatural, eden style is best
>>
>>102175765
What about Lyra
>>
>>102177572
I just hate them, they're so constricting! I mean does a lion wear clothes? And the lion is the king of the jungle! So why can't I, a humble citizen, go naked!?!
>>
File: 1000019370.jpg (202 KB, 2400x1080)
202 KB
202 KB JPG
>>102170867
Alright I had a sleep on it and I think I can make this work.
It needs 2x PCIe x8 and 3 6-pin 12v connectors. My PSU can handle it but I'll still probably run the board from a second PSU, but I was thinking..

D'you reckon I can run my 6900xt off the M.2 CPU slot?
>>
>>102169541
>I legit cannot tell the difference between the 33B sized models and 12B ones.
I used to think I knew the difference between 7B and anything else sub 70, but then my assumptions got run over by a Beagle. https://huggingface.co/TheBloke/UNA-TheBeagle-7B-v1-GGUF if you're interested. Nearly left the url out, but on reflection, schizos can fuck themselves with broken glass.
>>
if y'all never buy any ads this whole site is gonna shut down you know
>>
>>102177780
they're selling passes on a discount trying to make up for lost ad revenue
>>
>>102169541
The issue isn't ram, but better inferencing engines.
>>
>>102177870
The issue is waiting for the big bitnet models to finish training. Llama 4 is still a couple months out probably.
>>
>>102177780
Please oh God please please let this place die. I will shitpost twice as hard if only that becomes a reality.
>>
>>102177953
>have enough VRAM to run 1.58bpw bitnet 405B at Q8
Feels good man.
>>
deepseek16b lite, codegeex4, codestral, does anyone here know any other good models for programming (python)?
>>
>>102177962
Why? What is wrong with you, exactly? Why do you need to arbitrarily destroy things that other people derive value from?
>>
>>102176479
The only people that did the "bullying" are other fine-tuners like Sao and co. because they're that kind of pathetic. You're transparent.
>>
>>102178168
Hi all, Drummer here...

You got me.
>>
File: file.png (18 KB, 150x544)
18 KB
18 KB PNG
>>102171482
I tried to print the tokens being generated by the draft model (using "self.model.detokenize") but the decoded tokens are garbage, even though I can see the draft model working properly in ST. Weird.
>>
>>102177780
>if y'all never buy any ads this whole site is gonna shut down you know
What should I advertise?
>>
Is AIDOOM a case of grifters using Stable Diffusion to produce Doom screenshots in sequence, and then telling investors "LOL AI DO END TO END GAME DEV NAO." ??
>>
>>102171482
>llamacpp python
Anyone know what cuda, gcc, and glibc you're supposed to compile this with? It's failing on my machine.
>>
>>102178184
Been enjoying Rocinante. Degenerate and rather light on the slop. Quants below 8 are retarded though sadly.
>>
>>102178284
just use a pre-compiled wheel
>>
>>102178265
Doom is just a renderer producing screenshots in sequence.
Not trying to be pedantic, the jump from image gen to "AIDOOM" isn't that big.
>>
>>102178316
I wish I could understand the point of that. Maybe I'm jaded, but it seems boring. Haven't we already proven than AI generated video exists?
>>
>>102178316
It's kind of big. Conceptually maybe no.
>>
File: 1667096200472680.jpg (143 KB, 600x705)
143 KB
143 KB JPG
what are the top 100 local llms for solving complex puzzles? i got a 1050 and 16bg of ram
>>
>>102178333
The point is you can clone games based on Twitch. ie this is a huge income stream disrupter. One day.
>>
>>102177780
I'm looking forward to the next stupid fucking meme you monkies latch onto at this point. "Buy an ad," has worn out its' welcome.
>>
>>102178355
they used an agent to play the game, they didn't scrape videos from twitch. i think it will never be viable to just use videos from twitch.
>>
>102178355
Are people here this retarded? I expected this kind of retardation in /v/ thread but not here.
>>
>>102178374
t. streamer
>>
xtc seems worse than dry
>>
>>102178405
>excluding the best tokens makes it worse
shocker
>>
I am the best idea guy here (I am too lazy to write code for a bit of trolling). How about a sokal sampler that picks random tokens and randomly changes their probability? Bonus points for the random change being 1% max.

https://en.wikipedia.org/wiki/Sokal_affair
>>
>>102178333
Little difference between jaded and skeptical, either is healthy.
The potential is you can just prompt a video game. I like to run my thought experiments from the vantage of a bedridden paraplegic with the tech to improve their quality of life (desu probably where the general population is headed in 10-20 years).
If I could sit in my care home and prompt up my ideal vidya, complete with AI companions, love, sex etc... I mean I'd just jack in and insert a feeding tube.
Always hits me how much all of this image gen/LLM stuff smacks of vivid dreams. I do love dreams.
>>
>>102178374
nta, but yes some of us are this retarded
>>
>>102178366
Yeah, but once you have the fundamentals down (WASD moves, shootan, physics and shit) you can conceivably just prompt shit into the game.
Obviously not that easy *right* now, but a handful of months? Sure.
>>
>>102172232
Nemo in fp16 doesn't quite fit on my 3090, I have to put a couple layers on the CPU. It seemed to help a bit, maybe I'm totally nuts.
>>
>>102178366
:^)

but, if it happens, imagine Twitch has drm but some people hack it and so it's like with movies where people record those in the theater "camming" - but now people can "cam" vidya releases.
>>
>>102178497
Stop pretending to be this retarded.
>>
>>102178529
The future is nearer than you think.

scotus will rule that drawing diagrams of how apps work counts as a valid violation of the toss, and you'll go to prison.
>>
>>102178481
something that might actually be possible would be using the full idgames doom maps archive to train a doom map maker AI, or maybe even a doom game that generates rooms on the fly
>>
>>102178546
STOP! IT HURTS!
>>
>>102178496
BLAS is only used for prompt processing, but if you have a GPU, the GPU will be used for prompt processing instead. it will not help with generation speed regardless.
>>
>>102178554
AIDOOM generates rooms on the fly, it's like watching a dream. Sometimes the player does a 360 and when they look back, there's a completely different room there.
It has object/environment permanence but it's sketch as fuck.
A map maker AI would be neat. I think training the AI to read maps via API/image identification would reduce training overhead and increase stability of the environment.
Brute forcing everything through AI number crunching shouldn't really be the answer, imo.
>>
>>102178496
Have you tried it at 8 bit already? Can't imagine there's too much difference (unless you found a usecase where there is one)
>>
Did the canucks clean up nsfw from all their training? I keep trying to make commander work and it gives me the vibes of llama3. As in it is desperately trying to generalize smut out of zero examples.
>>
>>102178357
>t. finetuner that wants to astroturf his model
>>
>>102178601
>Sometimes the player does a 360 and when they look back, there's a completely different room there.
i want to play this game now
>>
>>102177780
I do a lot of buy an ad posts cause I believe in reverse psychology.
>>
>>102178613
All model makers converge to dry assistant style, because corporate is where the money is.
>>
>>102178262
It seems like the draft might be using a different vocabulary/tokenizer so it likely sees the prompt as junk and generates equally junk outputs. It'll still technically "work" even if all the draft's predictions are wrong — just slower than running without it from the overhead of checking all the bad predictions while generating with the full-sized model.
>>
>>102178638
It's endlessly funny to me how Anthropic was founded by corpocucks who left OpenAI because they unironically thought GPT wasn't pozzed enough...and then they ended up creating one of the horniest and most unhinged models ever
>>
best model for simulating friendship?
>>
>>102178840
Human-100T
>>
>>102178815
It was just a facade.
>>
Is a 2 bit quant of a 120b model better than a 4 bit quant of a 70b model?
>>
File: temp0.webm (597 KB, 974x330)
597 KB
597 KB WEBM
>>102171482
I think this script's server ignores the generation parameters from the API
>>
>>102167373
>(08/29) Qwen2-VL 2B & 7B image+video models released
When are they going to release Qwen2-VL-72B?
>>
>>102178964
it has been released. api only :^)
>>
>>102178768
That... makes sense. When I used different quants of the same model it decoded correctly. I thought Nemo and Largestral used the same tokenizer...
>>
>>102178284
I have the cuda toolkit 12.5.1, but I'm sure the newer one will work, and gcc-12.4.
>>
>>102178964
from the qwen discord on the topic of releasing it:
>No confirmed plan but who knows:bob_the_builder:
>Yay we talked about it in the vl channel. No eta actually for 72b
sounds like they are unlikely to release it but it's possible
>>
>>102178992
Yeah Nemo is different from Largestral. Mistral v0.3 does use the same tokenizer though.
>>
>>102178962
Hmm, it definitely works for me on SillyTavern at the /v1 endpoint with Default/OpenAI style API set; at least. E.g. cranking up the temp and disabling guardrails turns it into gibberish and then adding Min-P at 0.05 in with the same temp makes it coherent.
>>
Command R is actually decent with picking up in a story. I have a chat with about 40k tokens already from another model and Command R adapted to it no problem.
>>
>>102178866
She's too fat. Gotta find something that's a LOCAL MODEL
>>
I've been out of the loop for a while. Has anything surpassed midnight miqu for 70b? I'll take a good 8x7b too.
>>
>>102179547
Go fuck yourself, shill.
>>
>>102179547
No but you're about to get a bunch of shill responses desperately trying to convince you otherwise so they can limit their ad spend.
>>
quickest local model to run on a 4090 that is instruction but wont refuse anything?
>>
>>102179595
Wow, you were instantly proven right.
>>
>>102179601
Qwen 0.5B Instruct
>>
Smut finetunes of Mistral Large are literally all I need now. It almost never fucks up world modelling or says anything nonsensical.

I just wish it wasn't so slow. I've got 36GB VRAM but I still pull less than 1t/s on the IQ3_M.
>>
File: file.png (720 KB, 768x768)
720 KB
720 KB PNG
>>
File: file.png (38 KB, 615x490)
38 KB
38 KB PNG
>>102179601
>>
>>102179685
this is a lot worse than before
>>
>>102179265
I take it you are talking about nonsexual?
>>
>>102179714
don't reply to petra
>>
>>102179719
Obviously? I know we joke around here a lot about that sort of thing but it's not like we are actually sitting here trying to use LLMs for sexual gratification, lol.
>>
>>102179810
Y-yeah of course n-not
>>
>>102179805
>>102179805
>>102179805



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.