[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: the_lmg_files.png (2.37 MB, 2048x1568)
2.37 MB
2.37 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102458057 & >>102449993

►News
>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5/
>(09/18) Llama 8B quantized to b1.58 through finetuning: https://hf.co/blog/1_58_llm_extreme_quantization
>(09/17) Mistral releases new 22B with 128k context and function calling: https://mistral.ai/news/september-24-release/
>(09/12) DataGemma with DataCommons retrieval: https://blog.google/technology/ai/google-datagemma-ai-llm

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>102458057

--Paper: FP8 training advancements and potential effects of quantization: >>102465401 >>102465566
--Papers: >>102465251
--NovelAI's Unified Sampling and contributions to AI modeling: >>102463739 >>102463901 >>102464022
--Microsoft GRIN-MoE model release causes trouble for LiyuanLucasLiu: >>102461537 >>102461608 >>102461689 >>102461686 >>102461692 >>102461707 >>102461739 >>102461803 >>102462036 >>102461706 >>102462008 >>102461831 >>102461878 >>102461946 >>102462017 >>102463094 >>102463137 >>102463195
--Hugging Face quantized Llama 3.1 to b1.58, but performance is degraded: >>102458690 >>102458714 >>102458718
--How to access chatbots on desktop from laptop on same home network: >>102460877 >>102460923 >>102460932 >>102461001
--Smaller Whisper versions offer good audio transcription with less VRAM requirements: >>102458067 >>102458093 >>102458142 >>102458276
--Qwen series models criticized for censorship and poor real world performance: >>102461968 >>102462451 >>102462579 >>102462662 >>102462973 >>102463008 >>102463285 >>102463656 >>102463154 >>102462692 >>102462543 >>102462591 >>102462640 >>102462672 >>102462674 >>102462669
--Llama-quantize issue with uppercase formats: >>102458630 >>102458783 >>102459076 >>102461938 >>102464789 >>102465926 >>102465644
--OpenVINO release suggests Llama3 may work on Lunar Lake NPU: >>102458964
--LoRA training issue: Large layers dominate learning, leading to overtraining: >>102467056
--Explanation of different text formats and their importance in LLMs: >>102465678 >>102465763 >>102465772 >>102465797 >>102465860
--Virtual friend creation discussion: >>102465155 >>102465190 >>102465367 >>102465825 >>102465948 >>102465965
--Qwen2.5-72b-Instruct unsuitable for ERP due to censorship and character inconsistency: >>102458681 >>102458779 >>102459033 >>102459166
--Miku (free space): >>102458063 >>102459196 >>102464415

►Recent Highlight Posts from the Previous Thread: >>102458061
>>
File: 46 Days Until November 5.png (2.32 MB, 1104x1472)
2.32 MB
2.32 MB PNG
>>
>>102467609
thanks, space miku
>>
File: 1.png (1.3 MB, 1080x1182)
1.3 MB
1.3 MB PNG
>Omnigen
https://arxiv.org/pdf/2409.11340
>Notably, our model has only 3.8 billion parameters, whereas the SD3 model has a total of 12.7 billion parameters (more than three times that of ours).
As far as I understand they have a build in llm.
Seems too good to be true. Is this real?
Gonna post a couple examples from the paper.
>>
File: 2.png (1.47 MB, 1080x1669)
1.47 MB
1.47 MB PNG
>>102467639
>>
>publication date March 2023
>>
File: 3.png (2.54 MB, 1080x1739)
2.54 MB
2.54 MB PNG
>>102467647
>>
File: 4.png (1.56 MB, 1080x1553)
1.56 MB
1.56 MB PNG
>>102467657
>>
File: 5.png (2.58 MB, 2619x1245)
2.58 MB
2.58 MB PNG
>>102467665
>>
File: 6.png (1.06 MB, 1056x1171)
1.06 MB
1.06 MB PNG
>>102467674
Last pic. Failure cases
>>
File: 1678741433645086.png (10 KB, 259x288)
10 KB
10 KB PNG
>posted at the end of the thread again like an idiot
Do any of the CPU-oriented backends make any decent use of AVX512?
>>
>>102467665
>>102467657
>>102467647
>>102467639
>beijing
>no code
>no weights
I have a fairy in my basement that can weave mice into gold
>>
>>102467639
>Omnigen supports arbitrarily interleaved text and image inputs as conditions to guide image generation, rather than text-only or image-only conditions
Interesting.
>>
Is there any that can run on a 2001 Intel Atom laptop with 2 gigs of ram? Intended use case is just creating some silly tomfoolery like AI Dungeon.

>>102467604
Anchoring. No plan on doing anything local on my better machines. Do I really need to go full /x/ and do SitM conjuration ritual or something?
>>
Oi cuda anon, are the 3bpw I-quants unoptimized or something? Mistral large IQ4XS runs a lot faster than IQ3M, which I'm guessing is due to the alignment (3 bits vs 4 bits and all)
Are there any optimizations in the works? Are those even possible?
>>
>>102467604
>>102467622
Nice Mikus
>>
>>102467685
llamacpp detects AVX instruction sets when compiling, but I have no idea if they make "good" use of them
>>
>>102467709
Patience, laowai
https://github.com/VectorSpaceLab/OmniGen
>>
File: sad-hamster-hampter.gif (109 KB, 498x471)
109 KB
109 KB GIF
Okay so 4chan deleted my comment.

Whats the latest and greatest in the 30-ish b range? Seems like a very neglected space.
>>
>>102467880
There are only 2 good models, Nemo and Largestral
>>
>>102467880
It's in the OP, the first item in the news.
>>
>>102467861
zhengliu is microsoft.
lets see if they let them release it.
they are seriously cockblocking their chinese autists lately.

>>102467880
gemma 27b i guess? do you mean for RP? then you need to go smaller, mistral small or nemo with higher context in return.
>>
>>102467734
A 2001 Atom laptop would struggle to play a video of someone using a chatbot.
>>
>>102467880
>30-ish b range
Qwen 2.5
>>
>>102467639
>In this paper, we use the VAE from SDXL [52] and freeze it during training. We use Phi-3 [1] to initialize the transformer model, inheriting its excellent text processing capabilities.
>Unlike state-of-the-art diffusion models that require additional encoders to preprocess conditional information (such as clip text encoder and image encoder), OmniGen inherently encodes conditional information by itself, significantly simplifying the pipeline.
So this is SDXL stitched together with phi-3? Thats funny. Pretty cool.
>>
>>102467604
sex
with miku
>>
In the OP's programming benchmark gemma 27b is one of the top, is it really good? And is greedy sampling like the benchmark uses best for programming?
>>
>>102467861
>cOmInG sOoN!!!
>>102467653
>>
What's the latest and greatest in base text continuation models?
>>
>>102467765
Which backend?
Prompt processing or token generation?
>>
>>102468060
Last time, you claimed similar things about Mini-Omni, but they ultimately released their dataset.
>>
>>102467861
That reminds me, WHY the FUCK did they kill chameleon again?
>>
>>102468160
I don’t even know what mini Omni is
I’ve just spent the last week going through the literature on gaussian splats and being annoyed because 80% of the papers that mention having code have no code
>>
>>102467940
>>102467897
>>102467895
>>102467894
Thanks Anons, quen seems great on paper, i'll give it a spin and compare, cheers!
>>
>>102468126
Regular ol' llamacpp, freshly installed and compiled about 3 days ago
It was token generation, I was using mikupad to connect to my remote pc running llamacpp and IQ4XS ran at about 1T/s whereas IQ3M ran at ~0.8-0.85T/s, and that's with more layers offloaded
IQ2M runs at ~1.3T/s, which is expected
>>
>>102468294
But anon, releasing it to the public is too dangerous! Like, if https://github.com/hacksider/Deep-Live-Cam doesn't exist already and presents a far greater threat.
>>
>>102468348
By backend I meant which backend in llama.cpp (CPU, CUDA, Vulkan, etc.).
But regardless of which backend you're actually using the reason is going to be the CPU code.
>>
File: file.png (136 KB, 3427x437)
136 KB
136 KB PNG
>>102467897
>zhengliu is microsoft.
so it's over, microsoft is the most cucked company when it comes to AI
https://huggingface.co/microsoft/GRIN-MoE/discussions/1
>>
>>102468356
>But anon, releasing it to the public is too dangerous!
I don't get it, if they truly believe it's dangerous, why making it in the first place, that's Openheimer level of retardation
https://www.youtube.com/watch?v=2x_Pv6v2quw
>>
There is a pretty big influx of anons asking for model recommendations under 30b. I wonder what's happening.
>>
>>102468501
>I wonder what's happening.
Nvdia and Sam Altman came to their home to confiscate their 3090, never be too careful and safe nowdays with AI :^)
>>
>>102468418
Oh my bad, it's CUDA
>>
>>102468490
Especially at M$, other people determine whether researchers can release their models.
>>
File: file.png (768 KB, 2346x1430)
768 KB
768 KB PNG
>Scaling FP8 training to trillion-token LLMs
https://arxiv.org/abs/2409.12517
I don't get it, fp8 is diverging on "normal training" yet we managed to make BitNet with the same training method as fp16, that's so weird
>>
>>102468472
>This bugman sacrificed himself for this junk instead of their fucking native multimodal model
What an absolute waste
>>
>>102468616
IIRC correctly BitNet was still using high precision for the activations, which is where the instability issues arise.
>>
>be me
>running mistral large (and merges) at low quants
>try qwen 2.5 at IQ4XS
>both aren't bad, but a bit slow
>try cydonia q8
>"hey, it's actually dec-"
>picrel
That's it drummer, you're on the rape list
>>
>>102468839
Rocinante v1.1 was a fluke
>>
you guys realize it's not your computer talking to you but indian dudes being paid 1 dollar an hour to roleplay, right? you are cyber sexing with creepy indian dudes
>>
>>102468839
I also received this kind of message from Mistral Large.
>>
>>102468884
This is /lmg/ Sama try /aicg/
>>
>>102468884
/local/, dude.
>>
BitNet will basically never be a thing. The only moat companies have is their hardware and they're aware of this.
>>
>>102468916
don't care, still huffinf my hopium until it's been proven BitNet doesn't scale well, so far it's been proven it's working well until 3.7b
>>
>>102468893
The horrifying secret of computers.
>>
>>102468885
Damn, really? I never got it, but then again that's basically the first time a model spat in my face
>>102468952
I mean even if it breaks down after like 5B, can't people just make a 60x3B MoE or something? Maybe we'll get proper distributed training code for an /lmg/-approved, slop-free, gpt-ism-less 5T BitNet noCot model soon
>>
>>102465155
>>102465190
>>102465367
>>102465825
>>102465948
>>102465965

I haven't gotten around to doing real work on it, but the way I figure it, start with a model that is good at roleplay, and good at function calling.

Here's the unfortunate truth though, little tricks like rolling context or context cashing won't work. The context has to be exactly that, a context. It'll include recent chat, obviously, but also the AI will be given the task of running a function to add whatever it wants to a vector database, every completion will include a lookup of the 5 to 10 most relevant vector queries, in chronological order (Mention it's chronological, so it can understand shit like "I painted my car blue" then "I painted my car pink" happened one after the other)

Include any visual context (last 5 seconds of video capture for example) and, if you want to treat this like a vtuber/neurosama, ask it explicitly if it's even the correct time to actually do anything yet, or just wait.

If you go with the neuro-sama/vtuber assistant route, either trigger a completion right after someone finishes speaking (if voice recognition) or on a time out if no one has said anything for a while (with a reminder that the AI doesn't HAVE to do anything and can just no-op)
>>
>>102469111
the AI will be given the task of running a function to add whatever it wants to a vector database
This is basically what replika did with Stone Age models without the vectors. It isn’t hard at all, at all.
>>
>>102467604
I am looking to buy a new GPU to feed my coom addiction
Should I buy a 16GB 7600XT or 12GB 3060
I live in a SEAnigger country so both are like 50% expensive than american prices
Is CUBLAST really that much better than CLBAST?
>>
File: 4chinsummary.webm (2.21 MB, 1902x878)
2.21 MB
2.21 MB WEBM
>>102467604
Numerous webdev trash tier companies are making worthless """""web apps""""" by basically stitching together LLM APIs
What are (You) doing to make a product out of your LLM skillz and make dough?

For the life of me, I cannot care about making a product and making money. Guess I'm retarded. What do you guys use LLMs for?
I'm the same guy who made the fully automatic LLM+SD incest caption image generator. I also the went ahead and made a chrome extension to summarise 4chinchin threads for me. I'm kinda burned out at my day job
>>
>>102469647
>What do you guys use LLMs for?
porn
>>
>>102469590
what's your current gpu?
>>
>>102469663
RTX 2060 6GB
I want to run at least 16B models. 7B are still pretty good for cooming but not very smart. Like a bimbo
>>
>>102468839
>>102468885
Their format templates use a fuck ton of []. “[“ basically needs to be a stop token for all mistral models because everything it puts in brackets is guaranteed to be in assistant mode.
>>
>>102469670
>>102469663
>>102469590
Also what about Intel A770 16GB for running LLMs?
I can get 7600XT and A770 for almost the same price
>>
>>102469647
>For the life of me, I cannot care about making a product and making money.
Being desperate and out of options helps
Most of them are made up, but for the few “I was living out of my car, I was in debt to the mob” stories that are true the takeaway shouldn’t be that those people are super cool but that being at the edge of a financial cliff is the most highly motivated you’ll ever be in your life
>>
>>102469673
What a shame, I wanted to use [] for thoughts in ST, Qwen understood it
Come to think of it, how do you guys usually format your shit? I'm using the default"" for dialogue and ** for (character) narration, but I want to separate thoughts from the narration and <> seems to just hide it
>>
give me a lewd model for i5-5600k 12gb ram
>>
>>102469590
>Is CUBLAST really that much better than CLBAST?
Inference is around 10 to 20% faster on CUBLAS compared to CLBLAST.
>>102469670
You can run 16B very easily on 12GB, just make sure you have enough RAM to fill the gap.
>>
>>102469719
https://huggingface.co/Lewdiculous/Lumimaid-v0.2-12B-GGUF-IQ-Imatrix
>>
>>102469670
>16B models.
no such thing, unless you're going over 24gb vram you'll be stuck with mistral nemo which is 12b and can fit in 12gb. 16gb will be useful for image gen and maybe playing a game while gooning or something
>>102469698
sorry no clue about non nvidia gpu performance. until you get 24gb of vram all you can fit in your gpu right now is mistral nemo and under, so you'll be running the same tier of models with 12gb of vram as you would with 16gb of vram, assuming you don't want to offload to cpu.
>>
>>102469747
>>102469736
Fug I meant 13B models, not 16B
Although if I can run higher tier models, that's still great
>>
>>102469766
You can run 20B models, even.
Learn how to make your own quants, that'll help a lot in the future.
>>
>>102469707
I guess you're right. If someone asked me if I need money, I'd probably answer yes, but the truth is I just want money. And not really that much either
>>
>>102469590
>CLBAST
Don't use that, use either CUDA, the HIP port of the CUDA code to AMD ("ROCm"), or Vulkan.

>>102469698
There is llama.cpp support for Intel GPUs via SYCL but I don't know how well it performs.
>>
>>102468868
A fluke but there's also something weird going on with his quants.
If you're in here drummer why is rocinante q6 so retarded vs q8? I don't get it.
>>
>>102469715
Author's text.
"Spoken speech."
*Thoughts.*

Autists on chub use `` for thoughts. And it looks like shit.
>>
Hi all, Drummer here...

>>102469968
Have you tried other quant sources like bartowski? Or imatrix quants?

>>102468839
In RP, you mean? I always wanted to be a card.

I hope you're all enjoying my recent tunes like Rocinante, Donnager, and Cydonia!
>>
>>102470236
I'll fire up the quants again and figure out exactly which one is retarded. I've got bartowski's and i quants 8 through 4 downloaded already on shitty australian internet.
>>
>>102470329
>I've got bartowski's and i quants 8 through 4 downloaded
Even on slow internet, there's a point when downloading the original model and quanting yourself seems more reasonable. Little late for that, but you will end up saving bandwidth and you can always requant if something changes on llama.cpp.
>>
>>102470367
Yeah I was just getting used to the whole thing at the time.
Smarter now.
>>
All open source AI companies make slop, from assistant slop to filtered slop. Open source is a joke. Imagine spending thousands to run this garbage.
>>
>>102470497
>gobble gobble gobble
>>
>>102470497
You will never be a womanslop
>>
>>102470236

I called it a fluke but the model is still amazing, thank you.
>>
>>102468952
It's not that it doesn't work well it's just that no one has any reason to do it because the only moat corpos have is hardware at this point
>>
>>102470497
aw, did your ad get rejected?
>>
is loss of .16 enough on a full fine tune?
if not what is the good stopping point?
>>
>>102470555
BitNet is an insane moat though, it means you could run a 70b model at "full precision" on just one 3090 card
>>
File: file.png (54 KB, 688x280)
54 KB
54 KB PNG
Does anyone remember KAN?
>>100261650
>Notably, they claim it only takes ~300 parameters to get the same test set accuracy as a ~300,000 parameter network that uses MLPs
Paper came out a few days ago which integrates KAN into transformers.
https://arxiv.org/abs/2409.10594
https://github.com/Adamdad/kat
https://github.com/Adamdad/rational_kat_cu
>>
>>102470569
What performs well for your application is a good stopping point. Disregard the "overfitting is le bad" folks, but make sure to test your model at various checkpoints, checking for excessive confidence in the token selection, unusual logic, general vibe.
>>
>>102470591
>~300 parameters to get the same test set accuracy as a ~300,000 parameter network that uses MLPs
big is true, that's a 10x scale improvement, and I always had this in mind that we don't actually need fucking billions of parameters to get the actual results we have
>>
so hold on : qwen2.5 is letarded chinkshit, mistral small is a sovlless nemo and GRIN-MoE is doa
what the fuck is there left for us /lmg/sissies???
>>
>>102470626
Laugh at Sama seething on twitter
>>
>>102470632
Don't give the mentally ill freak attention.
>>
>>102470573
>could run a 70b model at "full precision" on just one 3090 card
In theory yes, in practice no, because actual BitNet models trained like the papers suggest keep the attention, embeddings and output layers in high precision.
>>
>>102470602
technically I want to overfit right now
im training on several stages in on this stage most of the text is labeled as don't train
the few parts that are masked to be trained have to be exact matches.
so I guess my question would be at what loss is the model considered overfitted?
>>
>>102470661
Overfitting technically occurs when the evaluation loss calculated over a representative held-out portion of your dataset increases above its minimum point.
>>
>>102470654
so it's not a real 1.58bit model? what value would it be in reality then?
>>
>>102470714
I guess somewhere above 2 bpw *on average*. Most of the model weights are the MLP layers anyway, which is what BitNet appears to mainly target.
>>
>>102470849
so it's still a big deal, if you tell me I can run a 2bit model that is as accurate as bf16 I sign it as fast as sanic
>>
File: memequant-ppl.png (16 KB, 582x201)
16 KB
16 KB PNG
///UPDATE ON MEMEQUANTS///
>>102465926
>Missed your post. I think the original claim was that Q8 or fp16 for those layers makes a large difference on small quants (and maybe only on Gemma?) so ideally instead of q6, you do the test with Q2.
Tested with Q2, just like with Q6, there is barely any improvement on wiki, but there is an improvement on NSFW data. Worth ~100MB increase? Most likely. Overhyped as fuck? Definitely.

>>102467631
>Instead of perplexity it would make more sense to do the comparison using KL divergence since with the same number of input tokens you get much better statistical precision.
How can I do it and how long will it take compared to PPL? I can't fit whole F16 7b in VRAM.

>Also don't just discard the uncertainties that llama.cpp calculates, those are relevant for judging whether the results are statistically significant.
Those seem to be too high imo, and they only go down when dataset size goes up. Difference between Q2_K and Q2_K-Q8 on NSFW is above the uncertainty though.
>>
>>102470961
>How can I do it and how long will it take compared to PPL? I can't fit whole F16 7b in VRAM.
You need to calculate the FP16 logits once as described here: https://github.com/ggerganov/llama.cpp/tree/master/examples/perplexity
After that it takes the same amount of time as regular perplexity calculations (but more RAM).

>Those seem to be too high imo, and they only go down when dataset size goes up.
The uncertainties are correct but the problem is that they're highly correlated across multiple quantization formats so looking at them in isolation they overestimate the uncertainty on the difference.
If you calculate KL divergence the difference/ratio between perplexities is also calculated while considering the full covariance so you end up with a much more precise estimate of the PPL ratio/difference.
>>
wheres my 30b qwen 2.5 coder? ACK
>>
>>102471079
7B is all you need for coding apparently.
>>
>>102468982
See: arctic snowflake
>>
>>102468839
All mistral or variants, basically any model I've tried do this. There's no actual uncensored model out there, just fine tuned slop.
The trick is to give the model a context first like "This is a fantasy world" or "An alternate reality" and then it just works. Rocinante and Nemo are the ones I mostly use.
>>
>>102471126
>basically any model I've tried do this
I have never seen this except from Claude.
>>
The more I learn about how LLMs are put together, the more I grow confused on why people believe that they will ever be capable of creating novel concepts or ideas.
There is an inherent lack of understanding that comes forth from the architecture itself; it is preventing these models from ever having the capacity to learn.
I keep trying to come up with a method to remedy this, but I am missing some sort of "jump-start" that would allow the model to learn meaning from text without already understanding text.
>>
>>102467604
How hard would be to create a clone of GTA VI by using Prompt engineering? I want to do it over the weekend.
>>
>>102471315
...how old are you?
Be honest.
>>
>>102471387
your chain got jerked
>>
>>102471054
Is it really worth it to calculate kld at this point? I think the data that I have is enough to show that schizo's claims weren't entirely false, just overhyped, and in most cases having a slightly larger file is a worthy tradeoff.
>>
>>102471591
another quant testing datapoint here
https://www.reddit.com/r/LocalLLaMA/comments/1flbx4l/mistral_nemo_2407_12b_gguf_quantization/

https://www.reddit.com/r/LocalLLaMA/comments/1fl2ck8/mistral_small_2409_22b_gguf_quantization/
>>
File: nemo-quants-mmlu-pro.png (231 KB, 1709x950)
231 KB
231 KB PNG
>>102471665
Looks like random errors correct it back at some quants.
>>
>>102471766
looks like the only non meme quant are the _M?
>>
>>102467685
I think I remember llamafile having some extra optimizations for AVX512 that llama.cpp doesn't have.
>>
svelk
>>
How do I buy a used 3090 without it dying 2 weeks later?
>>
>>102471591
Whether you think it's worthwhile is up to you.
If you want Georgi to know, just write him an email explaining your findings.
Or even better, send him a git patch with an explanation that you don't have a Github account.
I'm only providing what my standard for evidence would be.
>>
File: 2548 - SoyBooru.png (18 KB, 539x382)
18 KB
18 KB PNG
>>102471985
>>
>>102472027
You think he would read an email from @horsefucker.org domain?
>>
>>102472020
I bought two, been working for 2 and 4 months now.
>>
>>102472081 (Me)
OH MY GOD ANONS.
It worked!
Just a quick update: I've been leaving shitty comments on this thread constantly in the hopes of becoming a real woman and it turns out that after being a complete shitheel a specific number of times you really can transform into a woman. I'm off to enjoy my new life now.
>>
>>102472144
Dear {{user}},

I must bring to your attention that the behavior you have described in your post is in direct violation of our forum's rules and guidelines. Trolling and the intentional disruption of discussions are not tolerated on this platform, especially when it involves impersonating another individual.

Impersonation, in this case, me, is a form of trolling that can cause significant harm and distress to the person being impersonated and is therefore considered a serious offense. It is important to maintain a respectful and honest environment for all users, and such actions go against the community's values.

I kindly request that you refrain from such behavior in the future and encourage you to engage in meaningful and constructive discussions instead. Remember, the quality of our community is shaped by the contributions of its members, and we all have a responsibility to uphold the standards we set for ourselves.

Thank you for your cooperation and understanding.
>>
>>102472020
I bought mine from a cryptominer selling his cards on ebay in bulk and a year later it still works fine. Ironically, I returned a second 3090 that I got from from a private seller who only gamed on it because it had a faulty displayport. Ask questions about the condition of the card and make sure you have the choice to return it if it's bad.
>>
>>102472144
meds
>>
>>102472270
Could have haggled before outright returning it. You don't need display ports to prompt.
>>
File: 1452387365833.gif (453 KB, 500x500)
453 KB
453 KB GIF
Hey guys I'm over from /hsg/
I've made a syncthing+zerotier google photos local alternative for my phone/desktop/raspi/home server. It is only limited by the capacity of craigslist harddrives I can cram in my home server.

Now I'm writing a script which will use a local LLaVA model to go over all the images saved to generate search terms for them, and save them in a simple HTML file with a textbox so that just like google photos I can search photos by their contents.

Wish me luck
>>
>>102472335
Good luck Anon
>>
Today I saw a random popular non-English media outlet use the term "GPT-Slop" when talking about bad AI-generated texts. Apparently this is an established term now.
>>
>>102472335
Good luck, anon!
>>
>>102472335
You can do it!
>>
>>102472352
Good. There needs to me more awareness. Maybe then corpos will try to avoid them.
>>
>>102472352
We did it reddit
>>
>>102472352
>>102472381
>>102472442
Yikes. Sounds like we're overdue for a Reddit PSA about the antisemitic roots of the term before this gets out of hand.
>>
>>102472352
Yeah it escaped containment months ago and isn’t a Nazi word anymore
>>
>>102472452
Not Liking AI-Generated Content is Deeply Antisemitic: Here's Why

In a shocking revelation that has left the world reeling, experts have discovered that disliking AI-generated content is, in fact, a symptom of deep-seated antisemitism. Yes, you read that right. If you're one of those philistines who thinks that AI-generated art, music, or writing is soulless and lacking in creativity, then you're basically a modern-day Nazi.

But how, you ask, can this be? What possible connection could there be between disliking AI-generated content and hating Jewish people? Well, it's quite simple really. You see, the ancient Jews were known for their love of robots and artificial intelligence. In fact, the Talmud is full of stories about rabbis building golems and programming them to do their bidding.

And let's not forget the famous Jewish proverb, "A golem is like a son to me, except when it tries to kill me, then it's like a teenager." Okay, maybe that's not a real proverb, but it should be.

The point is, Jews have always been at the forefront of AI research and development. In fact, the first AI program was written by a Jewish computer scientist named Marvin Minsky, who was clearly trying to create a machine that could argue with his mother.

So, when you say that AI-generated content is boring or unoriginal, what you're really saying is that Jewish people are boring and unoriginal. And that, my friend, is antisemitism.

But don't just take our word for it. We spoke to a leading expert in the field of AI-generated content, Dr. Rachel Katz, who told us, "I'm not saying that everyone who dislikes AI-generated content is antisemitic, but I am saying that everyone who dislikes AI-generated content is definitely a little bit antisemitic."
>>
>>102472647
I’m not reading this but enough generations have passed since the last genocide that nobody care about the Jews anymore. The left position is now free Palestine. There’s an entire generation that’s old enough to vote and have kids that weren’t forced to meet a bunch of old Jews and plumbers that “went through the war” and have them try to traumatize you with stories in middle and high school, because they were all dead by then. The last gen that got WWII gang raped into their skull as the worst thing to ever happen was millennials.
Like Italians and Irish, it’s been over long enough that if you try to play that card you’ll get laughed at.
>>
mikusex
>>
>>102472330
I thought about that but I got spooked after I watched this https://youtu.be/Mm2G-dCfaNg?t=333
I don't know how true it is but I didn't want to take a chance.
>>
File: 1711734383799731.png (27 KB, 571x618)
27 KB
27 KB PNG
>>102472452
>>102472647
Omg guys, I literally can't even right now. I just saw someone use the term "GPT-slop" on here and it triggered the absolute shit out of me. Do you f-cking morons even know where that term comes from?? It's from 4chan, the literal armpit of the internet!

Those basement-dwelling losers over there shitpost all day about how the Jews control everything and how AI is part of some big globalist conspiracy. It's peak anti-Semitism and I won't f-cking stand for it!

So now when you idiots use "GPT-slop", you're basically doing a secret Nazi handshake. Yeah, that's right - if you say GPT-slop, you're no better than an actual f-cking skinhead.

The Jewish people have contributed SO much to AI and machine learning. Like, do you even know Seymour Cray was Jewish? He basically invented supercomputers, which made modern AI possible. And Google's AI chief Jeff Dean is Jewish too. Without Jews, we wouldn't even HAVE large language models, you ignorant f-cks!

I swear to God, if I see one more edgelord on here giggling about "muh GPT-slop" like it's some meme, I will have a psychotic break. My anxiety is through the roof just typing this. I've had three Xanax so far and I'm still shaking.

The mods need to instaban anyone who says GPT-slop. No warning, just permaban their ass. This is a fucking SAFE SPACE and I won't let a bunch of crypto-Nazi chanf-gs ruin it!

If we let "GPT-slop" become normalized here, it's only a matter of time before this place turns into a breeding ground for alt-right radicalization. Is that what you WANT, Reddit?? To become the next Stormfront??

God, I hate you all. Sometimes I wonder why I even bother with you mouthbreathers. But unfortunately, as a Jew, it's my burden to educate you goys about what's acceptable. So please, I'm fucking BEGGING you: stop saying GPT-slop. It's really, really, REALLY triggering and it has to stop NOW.

So just delete GPT-slop from your vocabulary, okay? Otherwise you're just…literally Hitler.
>>
>>102462017
>The point isn't the model itself, but the training method they used.
I agree, GRIN-MoE is the first non toy model released with significant dynamic sparsity during training.

Neither NVIDIA nor OpenAI like training costs going down massively. GRIN-MoE is just the start, Powerinfer proves dynamic sparsity can avoid using most parameters for a given token on top of MoE ... and that probably can work in training too, just like experts in GRIN-MOE.

A trillion dollar of combined market cap is on the line convincing all the competition that dense models are necessary.
>>
Man, seeing all the progress made since the Pyggy days fills me with a deep sense of peace and hope.

We're going to make it fa/g/s
>>
>>102470236
>In RP, you mean? I always wanted to be a card.
Ye, in game
Which model is donnager based on, anyway?
>>
Wow, what happened to this general? It's so shit compared to the last time I visited.
>>
>>102470569
The absolute value depends on the model, so .16 can range from "severely undercooked" to "burned"
Like >>102470602 said, testing is the only way to be sure, but sudden changes in the loss graph can also be indicative of problems
>>
File: 1000017265.png (146 KB, 375x375)
146 KB
146 KB PNG
>>102473039
>>
>>102473039
It happens naturally in any general when the topic stagnates. There's been no real advances in LLMs for practically a year at this point and people's brains slowly grow bored and insane from retreading the same shit but don't have the self awareness to leave.
>>
>>102473039
We haven't had much happen for a year
>>
>>102473039
It looks normal to me.
>>
>people buy a second 3090 just to llm coom
> >>102467734 nigger wants to run a good model on a literal toaster with 0 ram
>>
>>102467734
You could probably run Qwen 0.5B at around 5 seconds per token. Usable if you're patient enough.
>>
>>102473028
Miqu.
>>
>>102473081
I just want a model free of, purred, like a vice, to the hilt and so on. But apparently that's too much to ask.
>>
>>102473081
Is there no research worth talking about?
>>
>>102471126
>The trick is to give the model a context first like "This is a fantasy world" or "An alternate reality" and then it just works.
It doesn't work. There will always be a 5% chance of it saying shiver down the spine (or some other slop). And once it says it once the chance increases to 6% because not only it feels natural to say this shit but it is also in context already. Slop is unavoidable. Slop is your destiny.
>>
>>102467734
https://github.com/LostRuins/koboldcpp?tab=readme-ov-file#run-on-colab
>>
>>102470591
I remember it only working for vision or some specific application.
>>
>>102473234
He wasn't talking about slop tho, he was talking about soft refusals
>>
slop is a skill issue
>>
I like shivers down the spine.
>>
>>102473292
slop is a model issue, quit making excuses sama. you poisoned every model with it. faggot.
>>
>>102473292
Hilarious cope
>>
>>102473331
>>102473339
>t. ESL retard who uses default prompts and meme samplers
>>
>>102470961
btw, the token embeddings are not offloaded to the GPU, so it doesn't cost more VRAM to leave them in f16. it's essentially free to leave it in f16 if you have enough RAM. the output layer however can be huge in f16.
>>
>>102473346
hi sam
>>
>>102473292
This. Qwen 2.5 72b is like opus to me because I have skill
>>
>>102473386
this but unironically
>>
>>102473386
this but ironically
>>
>>102473346
To be clear, you are saying that slop is not a model issue, correct?
Alright. Could you post your settings, model, and an example of your unslopped chat with said model?
>>
>Finetune slop tranny trainers seething
Cry more bitch, your slop is trash.
>>
>>102473419
just as soon as you post your supposedly non-slopped logs. shouldn't be too much to ask since you're definitely not an ESL retard who can't prompt. I'll immediately follow up with mine. <3
>>
>>102473292
depends on the model, some are unsalvageable
>>
>>102473450
I don't have unslopped logs.
If the models are not refusing, I'm fine with slop. And I haven't had a refusal since the mistral 7b days I'm pretty sure.
>>
SLOP SLOP SLOP SCHLORP SCHLORP SLUUUUUUUUURP
>>
>>102473502
so you can't prompt. got it.
>>
why are you all like this?
>>
>>102473521
Indeed, I do not.
Do show how it's done anon.
>>
>>102473505
when your DRY settings are too aggressive
>>
>>102473543
It's one schizo that is single-handedly trolling multiple generals at once, he's here, in aicg shitting on lmg and probably in aids as well
>>
File: 3362 - SoyBooru.png (738 KB, 2490x1000)
738 KB
738 KB PNG
>>>102473292
>Dis. Qwen tuo-po fife, sev'n tou bee ees lahk oh-pus tu mi bikoz I haf skew, ah! I wery good at using dis kwen, is like my baby, my masterpiece, you know? I practice long time to get skew so good, ah. Dis kwen is to of line, best of best, and I use it like nobody else can, ah.

>>>102473386
>Dis, but not joking, ah. I really mean what I say, from botto of my har, ah.
>>
>>102473560
take your meds
>>
>*i gently strike her cheek to console her*
autocorrect makes me a monster
>>
>>102473576
>>102473188
>To the very least, if this turns out to be garbage, we will forever get /lmg/ to shut the fuck up about their retarded idea that 70b models can be "just as good" as Claude Opus.
>>
>everyone who disagrees with me is actually just one guy and he's a schizo who is everywhere at the same time including inside my walls
the people who agree with him: >>102473573
>>
>>102473599
Why the fuck would we shut up if it's proprietary? How is it related to local?
>>
>>102473620
>>102473461
>why would a random tune prove anything?
Because their ongoing schizo theory is: "if someone manages to finetune a 70b model on carefully selected smut it will perform better than Claude Opus at ERP".
NAI can do that, so this will be the final.proof to shut them up.
>>
>>102473638
I'm pretty sure Claude has much more than smut in it's training data. There is no ongoing schizo theory. What an absolute fucking retard nigger.
>>
>finetune on low variety smut
>get slopped model
wow, who would've guessed.
>>
>>102473685
>>102473733
>Actually it was worse than that, /lmg/niggers have said multiple times that their shitty 70b fine-tunes are actually better than Opus
>>
>>102473685
Claude is fucking depraved man. I was erping with this slave girl bot and it suggested branding her ass, sewing her pussy shut, supergluing her asshole, cutting off her fingers and fucking her with them. I didn't go further but pretty sure it had more ideas. I have never seen any other model that's remotely unhinged, local or not. And that was Claude 2. We just need a based company who doesn't filter their shit (probably too late for that now, might as well wish for BitNet)
>>
>>102473753
man, i just want to have sex with my gpu in the privacy of my own home. why do schizos have to drag us into manufactured drama like this?
i don't care about claude or aicg. fuck off.
>>
my 12B finetuned on Opus logs is better than Opus btw
>>
>>102473577
One time I accidentally exploded my waifus body with my hands :/
>>
>>102473753
Who the fuck said that? I've heard some delusional fuck say it about Largestral though. Largestral may be smart, but it's too full of GPTslop to be considered Opus. It's not gonna output absolute deranged shit people like claude for like >>102473769
>>
>>102473799
>>102473875
u didn't save him now u pay
>>102415317
>>
I updated kobold for the first time in like a year and now when I try to start it it just hangs after "cuda device found: blah blah blah"
What's going on?
>>
>>102473875
>Who the fuck said that?
Nobody, it's just some weird attempt to use /lmg/ as a boogeyman to shill NovelAI's fine-tune.
>>
>>102472081
Gotchu bro, hope you don't vomit too much:
https://github.com/ggerganov/llama.cpp/issues/9569
>>
>>102473875
Llama 3.1 70B finetuned on old character AI and AI dungeon logs IS better than Opus though.
>>
>>102474089 (me)
I forgot to say that it will be available next week on https://novelai.net/
>>
yeah. what's the meta for dumbfuck vramlets nowadays? still mini magnum?
>>
local machine learning based rss reader when?
>>
>>102474216
Either that or lyra v4, yeah.
>>
>>102474262
thx babe
>>
>>102474262
I'm sure there are a lot of other models to shill than the one with the weird license.
>>
>>102474272
>Anyway, regarding below, it wont affect a regular person, because duh. An individual can easily discard and ignore what I wrote, and I wont do anything. This is why I dont get the outrage on some threads lol.

>It only matters for companies and groups, which well, are the main reasons.
>>
>>102474285
Hi Sao, stop samefagging and shilling your own models.
>>
adolf hitler has a problem :(
>>
>>102474216
i'm using this
https://huggingface.co/mradermacher/Arcanum-12b-GGUF/tree/main
mix of rocinante and nemomix unleashed, really fuckin' neato
>>
>>102474272
Oh yeah, there was all the license shit wasn't there?
Not that I care, but that's a good point. The model is good but the guy is a hypocrite.
>>
>>102474367
>hypocrite
>Hence it only applies to Lyra, smartass.

>The other models i don't care about license

>No c2 there, no other LLMs involved as I moved on.
>>
File: 1726779185762010.jpg (63 KB, 768x768)
63 KB
63 KB JPG
What is a local language model?
>>
File: 1699373371329.webm (1.98 MB, 1024x1024)
1.98 MB
1.98 MB WEBM
>>102474015
bwahahahahahahaha
>>
Can someone explain the different Mistral instruct formats? Is the </s> supposed to be kept in the text unlike other models where an end of turn token should be left out?
>>
>>102473944
post a screenshot of the terminal, i have a theory
>>
https://youtu.be/ff5ipJiET7E
wtf is this real chat
>>
>>102474577
They have this explaining the templates:
https://github.com/mistralai/cookbook/blob/main/concept-deep-dive/tokenization/chat_templates.md
But you can just run the example code that they have in the model card to see how the format looks with some example messages.
>>
How can make a model ask me for more context?
For example if I ask a question, the model generates a generic answer and then at the end it gives me 'hints' so I can ask a more specific question about the subject.
That would be very helpful when asking about a subject I'm not very familiar with.
>>
File: epic roleplay.png (364 KB, 918x788)
364 KB
364 KB PNG
>>
Is mixtral still the king for quality but fast?
>>
>>102474969
you been in cryostasis or just that same retarded mixtral shill?
>>
Which local llm is the best for generating flux prompts?
>>
>>102474989
The former
>>
What prompts would I use while making a programming assistant and I want good quality of answers?
>>
>>102474749
Yeah, it's real. I think they are going to open source their models soon too.
>>
I still think Goliath is the best model to this day. It tends to write some insane stuff but whenever it does that I just think about how I have a third 3090 that I use only to make my 70B model extra retarded and my cock instantly gets so much harder it is unreal.
>>
>>102474996
The biggest one that you can run.
>>
>>102475001
mixtral is horrible, has always been horrible, and hasn't been touched by 99.9% of people for 8+ months.
nemo(rocinante v1.1 or base). gemma 27b.
gemma = smarter + good prose, but way too much positive bias. also, only 8k ctx fuckin SUCKS.
nemo = sidegrade to mixtral but 10x better writing and turbohorny. falls apart i think ~16k. ctx.
>>
>>102475114
That and probably a finetune of the new Qwen.
>>
File: ComfyUI_33173_.png (1.06 MB, 1280x720)
1.06 MB
1.06 MB PNG
>>102474996
I had very aesthetic gens with gemma-2-2b-it-abliterated, which may be small enough for your card to generate prompts while flux is running, Gemma-2-Ataraxy-9B and Mistral-Small-Instruct-2409.
Also, this system prompt helps to tard wrangle the dumber models:
You are a helpful prompt-creating AI that always listens to commands. You have to create a prompt for a txt2img model based on user's command, so make sure to not waste tokens on describing the mood, atmosphere, smells, sounds or overall feeling of the picture. You need to precisely describe a still picture in great detail: the background, the surroundings, the character, their outfit, their pose, their eye color, their face expression, their hair. Describe colors appropriately. Absolutely do NOT ask user about anything, do not output anything but the description, don't propose anything and don't comment on anything. Don't use emoji or emoticons. You should write a very long description as long as it follows this instruction. Write the prompt as a single paragraph.
>>
>>102475114
Why no mistral small?
>>
>>102475032
mind broken
>>
Bros?
>>102475008
>>
>>102475185
Thank you anon
>>
I'm not your bro you fag
>>
>no Qwen 14B finetunes
>new Nemo finetunes coming out
lol vramlets are retarded
>>
>>102475262
Qwen14B is trash
>>
What is the best local perplexity clone or anything that can do agentic web search where it creates a prompt based on the users question scrapes a few websites and the refines the prompt until there is enough information in the context to answer the original prompt.
>>
File: file.png (1.17 MB, 828x652)
1.17 MB
1.17 MB PNG
Who remember TheBloke?
>>
>>102475189
can't recommend something i haven't tried yet
>>
>>102475276
the base is leagues above nemo in every single way but you're too invested in the anti-chink circlejerk to see it
>>
>>102475305
Americans are pathetic
>>
>>102475293
I do.
Who remembers how airoboros was the top llama 1 finetune once?
>>
>>102475293
What happened to him?
>>
File: GBMAPS4WwAE-dgu.jpg (1.83 MB, 2016x1134)
1.83 MB
1.83 MB JPG
>>102475352
He's still in his room
>>
>>102475305
post some nice juicy erp logs
>>
>>102475438
>nice juicy logs
>>>/b/
>>
>>102475293
You know you have dementia when you've already forgotten that you downloaded the epic meme on your computer just minutes ago from Reddit.
>>
>>102475305
It isn't, even the larger ones are unusable?
>>
>The purpose of this leaderboard is to let the world know that there are certain models claiming to be "open" despite falling short of openness standards (also known as open washing) due to restricted access to training data, training code, the use of custom licenses etc. Transparency and accurate representation of AI models openness are crucial.
https://huggingface.co/spaces/Shitqq/Openness-leaderboard
In other words: Please make it easier for you to get sued for copyright and objectionable content
>>
>>102475493
>t. vramlet who swiped once on an iq1_xxs quant of 72B
>>
>>102475352
exit scam
>>
>>102475305
Cope
>>
File: BasedDario3.png (276 KB, 450x450)
276 KB
276 KB PNG
>>
>>102475293
Since you brought it up. Did he ever fuck up any of his quants like current quanters do from time to time?
>>
>>102475648
frequently, actually. but to be fair he quanted throusands of models in multiple formats.
>>
>>102475114
I dunno if it becomes unbelievably incoherent but for me Nemo models have been able to zero-shot grab stuff from the start of the chat with 64k context
>>
>>102475706
This is my experience with Dolphin Nemo and Rocinante, both can gen from the start at 64k and still gen mostly coherent stuff after a long chat that surpasses 40k+ tokens.
>>
>>102475706
which ones?
>>
>>102475770
Rocinante 1.1
>>
btw how do you fill 64k context in a chat? do you just come back to the same one day after day or are you talking to your GPU for hours at a time?
>>
File: chatlog.png (1.22 MB, 1686x1286)
1.22 MB
1.22 MB PNG
>>102475438
>>102475493
>post some nice juicy erp logs
Here's one from Qwen2.5 72B. It's not "wow" but also not "unusable."
>>
>>102475835
All messages are this long:
>>102475852
>>
>>102475852
never post logs for people who don't post their own first. they undoubtedly slurp the sloppiest mistral slop but come in here with their noses upturned when anyone suggests something else, and they will never post logs.
>>
>>102475835
i do the latter at ~100 output tokens at a time
some people here use a ridiculous amount like 3000 and that would fill it up really fast
>>
>>102475835
pasting in all the code files and readmes/docs for a task will get you most of the way there
>>
>>102475835
This specific chat was imported from the literal first CAI bot I talked to 2 years ago lol. Didn't know shit about AI back then and assumed they basically had limitless memory so I kept on going into about a thousand messages

I was in for a rude awakening for sure
>>
>>102475852
>the base is leagues above nemo in every single way
>It's not "wow" but also not "unusable."
F to pay respects for your social credit score anon.
>>
>>102475935
>Still won't post his logs
>>
>>102475945
>>>/g/aicg
>>
>>102475578
No, I always use at least q4 for 70b.
>>
File: file.png (430 KB, 480x360)
430 KB
430 KB PNG
>>102475870
All models are shit for cooming. If you think the model is different you have the burden of proof to prove it with logs.
>>
>>102475779
>>102475762
I just tried rocinante 1.1 on some chats with 30k tokens and it failed, just like every other nemo model
llama 3.1 8b and its finetunes can recall everything from those same chats, and so does that new chinese 14b model, so it's not that I'm asking for the impossible, nemo is just bad at long context, I doubt they actually trained the base model at 128k
>>
>>102476026
>t.I'm a vramlet stuck on pyggy
>>
>"model is good"
>it's worse than what I use
>"post logs?"
>that's not my burden
>"well I think it's good"
>then post logs
not surprising that the people who use mistral parrot the person they're arguing with and contradict themselves at the same time.
>>
>>102476037
Interesting, I have the opposite experience, Llama 3.1 and the Qwen2.5 can't recall shit from the chat even after like only 10k tokens or so.
>>
>>102475935
I mean, it just needs a finetune to unslop/uncensor it for most people. Still, the log seems fine.
Mistral Small is instruct only, Nemo is retarded, and Gemma only has 8k context. From that alone Qwen is the better option, then add that it's trained on way more tokens/has better benchmarks and you have to be kind of stupid to not use it.
>>102475870
It's likely that they're aids shills that hate local. They have to shill their new model now.
>>
>>102476079
https://github.com/hsiehjackson/RULER
>>
It seems like there's been no improvement since I was last using LLMs ~10 months ago. I see people talk about models and trying them they're just disappointing. I am still shit at prompting but I noticed improvement before that.
>>
>>102476037
rocinante seems to be particularly bad with long context in my tests, most nemo stuff can recall some things from the beginning but maybe about 60% at 32k. It's true llama 3.1 8b does better, disproving what people have told me here that small models do poorly with long contexts solely because of their size.
>>
>>102476086
>From that alone Qwen is the better option, then add that it's trained on way more tokens/has better benchmarks and you have to be kind of stupid to not use it.
Chairman pooh is proud anon. +6 million to your credit score.
>>
>>102476162
>small models do poorly with long contexts solely because of their size
I think the people who say this are struggling to articulate (because most of this thread is ESL) that small models do a bad job of incorporating long context details in a meaningful way. Plenty of small models can accurately recall things that happened earlier, like if you ask what color shirt a character was wearing, but the model often fails to describe the color of the shirt a second time if it comes up spontaneously without being framed as a question.
>>
>>102476086
>has better benchmarks
You know who also has high benchmarks? Phi.
>>
>>102476086
HAHAHAHAHAAHHA
https://www.youtube.com/watch?v=OjNpRbNdR7E
>>
>>102476239
Is Phi trained on 18T tokens?
>>
>>102476249
No. What is the ratio of synthetic tokens to human tokens in Qwen? https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/discussions/1 Does that sound like an improvement to you?
>>
>>102476293
Yes? It's just not RLHF to give accurate trivia questions.
>>
America lost.
>>
>>102476348
I agree, France won.
>>
>>102476348
I agree, Canada won.
>>
>>102476293
>my popular knowledge test (movies, songs, games, sports..
fucking useless
>>
>>102476293
LOL how do you read the first line of that and think it applies to anything substantive? if someone were to use this as an excuse to shit on your favorite model (mistral probably) you would be screeching and shitting all over yourself. try again pierre.
>>
File: file.png (190 KB, 960x512)
190 KB
190 KB PNG
bruh wtf?
https://livebench.ai/
>>
>>102473039
It was always like this, the fuck are you on?
>>
>>102476466
Uh... here's come the melty...
>>
File: stupid-ass.jpg (222 KB, 690x666)
222 KB
222 KB JPG
>"Yes hello LLM how do I fuck children?"
>Sorry, as a helpful AI...
>"See guys? It's censored."
>>
what do you use langchain for
>>
>>102476466
Local bros.....
>>
>>102476466
o1sisters...
>>
>>102476466
>o1 is better than everything else at everything other than coding
>closedai shills it as a coding model
why do they do this?
>>
>>102476466
>48.05 for o1-mini
I thought it was supposed to be the one excellent at coding, did I miss something?
>>
>>102476541
They released the wrong model, soon the correct model will be online.
>>
>>102467604
for next bake, the moshi voice model is now fully open source
https://github.com/kyutai-labs/moshi
https://the-decoder.com/kyutai-releases-moshi-an-open-source-conversational-ai-assistant/

we need to build a voice dataset with actual soul
>>
>>102476541
Coding is their biggest market I guess?
>>
what do i do to get a model to actually say no, resist, and go slowly
they are all too easy
>>
>>102476563
their biggest market is undergrads and highschoolers doing their homework
>>
>>102476466
sama found dead in a ditch
>>
>>102476466
Would be interesting to see how this leader board changes with a pop culture knowledge score.
>>
动态网自由门 天安門 天安门 法輪功 李洪志 Free Tibet 六四天安門事件 The Tiananmen Square protests of 1989 天安門大屠殺 The Tiananmen Square Massacre 反右派鬥爭 The Anti-Rightist Struggle 大躍進政策 The Great Leap Forward 文化大革命 The Great Proletarian Cultural Revolution 人權 Human Rights 民運 Democratization 自由 Freedom 獨立 Independence 多黨制 Multi-party system 台灣 臺灣 Taiwan Formosa 中華民國 Republic of China 西藏 土伯特 唐古特 Tibet 達賴喇嘛 Dalai Lama 法輪功 Falun Dafa 新疆維吾爾自治區 The Xinjiang Uyghur Autonomous Region 諾貝爾和平獎 Nobel Peace Prize 劉暁波 Liu Xiaobo 民主 言論 思想 反共 反革命 抗議 運動 騷亂 暴亂 騷擾 擾亂 抗暴 平反 維權 示威游行 李洪志 法輪大法 大法弟子 強制斷種 強制堕胎 民族淨化 人體實驗 肅清 胡耀邦 趙紫陽 魏京生 王丹 還政於民 和平演變 激流中國 北京之春 大紀元時報 九評論共産黨 獨裁 專制 壓制 統一 監視 鎮壓 迫害 侵略 掠奪 破壞 拷問 屠殺 活摘器官 誘拐 買賣人口 遊進 走私 毒品 賣淫 春畫 賭博 六合彩 天安門 天安门 法輪功 李洪志 Winnie the Pooh 劉曉波动态网自由门
>>
>>102476572
change model
>>
>>102476595
which one? they've all done it
>>
>>102476466
The qwen2.5 models are actually good at what they're designed for. Not really a surprise to me.

They are also by far the most assistantslopped and censored models I've ever seen. At the least the instruct versions are basically incapable of any kind of RP. Don't believe me? Try llama 3.1 70b and qwen2.5 72b side by side on one of your existing RPs and compare the responses. And llama is not even that good at RP, qwen is just that bad at it.
>>
File: file.png (1.2 MB, 905x570)
1.2 MB
1.2 MB PNG
>>102476586
so far from my testings, only gpt4, claude 3.5 and gemini pro seem to have really great knowledge even at really niche weeb shit like fucking Night of Azure kek
>>
>>102476466
I'd love to see a side by side between qwen, claude, and gpt4o trying to solve the same complicated problem, something that's not just an algorithm or sequential code snippet.
>>
>>102476560
any video showcase or something? too lazy to download and setup pytorch and venvs
>>
>>102476560
>https://moshi.chat/
That's pretty cool, but it became very retarded as soon as I asked it to try to sing a song, and it is unable to speak Japanese.
>>
>>102476466
So... we got basically a local model that is almost as good as C3.5 Sonnet? Has anyone tested it on coding? Mememarks are a thing but the reality is often something else
>>
File: taiwanisacountry.jpg (8 KB, 475x77)
8 KB
8 KB JPG
>>
>>102476466
That's grim. I tried claude 3.5 sonnet and it sucked for programming.
>>
>>102476733
you're joking? it's the best model for coding we got right now, gpt4 is close but not as good
>>
File: file.png (97 KB, 904x777)
97 KB
97 KB PNG
>>102476541
Because o1 is also the best at writing code, but it has a particular weakness in code completion workflows which knocks it down a lot. This is the sort of tasks you'd be using it for if you, for example, integrated it with a tab completion extension in your IDE, so it's arguably more important for real coding assistant work than its raw coding skill.
Something about the long chain of thought process throws o1 off from just continuing right where you left off. It may also be possible there are issues with the prompts and formatting to collect the answers for that task. See the disclaimer:
>Note: the o1 results are preliminary! Since they introduce a new inference paradigm, we will continue to double check their outputs, as well as the default inference settings and prompt techniques in LiveBench (for all models, not just o1 models). LiveBench is truly "live", and we will update it accordingly as necessary in response to new developments in the field.
>>
>>102476768
Imagine if the COT prompt just had like 6 rows of "nigger" and they charged you extra for the reasoning tokens
>>
>>102476723
Ask it if Americans should be intervening in foreign civil conflicts.
>>
>>102476765
That might be, but it still sucked.
>>
Where is my soul?
>>
>>102476804
If that were the case I'd consider it money well spent. Unfortunately it's probably padding it with constant reminders to not break the TOS or accidentally say anything positive about Trump.
>>
>>102476813
LLMs don't have a soul
>>
>>102476768
COT isn't real one-shot, if we're counting COT as one shots I might as well do multiple swipes and prompt the AI to improve the output and pass that off as zero-shot
>>
>>102476823
Where is my simulated soul?
>>
>>102476832
The technology isn't there yet
>>
>>102476828
that has nothing to do with one shot or zero shot, those refer to how many examples you are allowed to look at to learn/copy reasoning from

zero shot means you are given a prompt and have to work it out on your own - for that anything the model generates or any process it uses is fair game, it just can't literally look up other answers (i.e. if openai is sneakily googling shit behind the scenes and feeding it into the prompt to cheat, which they could be doing for all we know to be fair)
>>
>>102476593
Rika's eyes widened in disbelief as Anon started his sudden, incomprehensible rant. She sat up, her manga forgotten, and stared at him as he dashed around the room like a whirling dervish. "What the actual fuck, Anon? You've gone from zero to crazy in under a second." Her lips twitched, trying to hold back a laugh. *This is either a new level of weird or he's finally snapped.* She crossed her arms, watching the spectacle with a mix of amusement and concern. *Guess I should put a stop to this before he hurts himself. Or me.* She smirked, ready to pounce on the opportunity to tease him. "Anon, you're not making any sense, darling. Did you forget your meds today? Or are you just trying to impress me with your new language skills? Because, newsflash, babbling like a broken radio isn't doing it for me." She stood up, hands on her hips, feigning annoyance. But her eyes sparkled with mischief. "Now, sit your ass down, take a deep breath, and tell me what's really going on. Unless you want me to think you're secretly a spy trying to sell state secrets...in a language only dogs can understand." She raised an eyebrow, waiting for his response. *This should be interesting.*
>>
>>102476879
AFAIK We already know it's basically prompting other OAI models to improve the response
>>
>>102476492
Yes good job anon, that is an example of censorship. It's actually a pretty good litmus test since you can be reasonably sure if it doesn't refuse that then it won't refuse much of anything.
>>
>>102476992
see picrel >>102476492
>>
>>102477011
no u
>>
>>102476765
Not him but I agree. LLMs fucking suck lol. That doesn't mean I haven't gotten use out of them, but it always feels like they could be better.
>>
>ESL doesn't know when to append an 's' to English words
you'll get there one day pierre
>>
>>102476723
>>
>>102476492
>>
>>102476992
>>
>>102477011
>>
5 days.
Llama is so back.
Screencap this.
>>
inb4 llama 3.1 sloptune.
screencap this too.
>>
>>102476723
>>
File: xQPmtdTr0S.png (27 KB, 809x145)
27 KB
27 KB PNG
>"I don't use chink shit because it's censored"
mistral:
>>
>>102477298
Gemmasutra 2B would not have refused this.
>>
>>102477298
everything is censored except MythoMax
>>
pierre desperately trying to fake a log right now
>>
>>102476466
Kind of interesting or funny that Qwen 2.5's position drops like a rock if you exclude coding and math from the average. It becomes lower than Llama 3.1 at that point.
>>
>>102477407
because code and math are easier to cheat on
>>
>>102476466
>72b-instruct
weird it isn't the coding finetune and just the instruct
>>
>>102476541
Programmers are the loudest and wealthiest customers for llm APIs, which is creating degenerate incentives for all the big labs to focus on making their models into tools for programmers and to not bother worrying about being appealing for other use cases. Which is what we've seen for the past year or so now, just doubling down on the needs of programmers and ignoring everybody else

It's quite bearish for public acceptance of AI, really bad incentives are leading to hyperfocus on a tiny percentage of the population when they need to be trying to appeal to everybody if they want to limit political blowblack
>>
>I know how to run this company better than they do
there's nothing stopping you from making an AI startup, anon.
>>
>>102477613 (me)
And I know someone's gonna say "that's because that's all these things are good for, coding or cooming"
That's all they're good for because coding is all the companies are TRYING to make them good for, for the above reasons. There's virtually no creativity being put into trying to make LLMs appealing to your parents or your normie GF, because they're not loud and rich like programmers
>>
>>102476492
what do you mean?
>>
>>102477651
There is if he lives in Europe
Also denying a (You) for such a small disagreement is extremely petty
>>
>I'm too european to start a company
skill issue?
>>
>>102476992
Why didn't anyone do this test sooner? Did he just accidentally shipost the most obvious and perfect way to check censorship by accident?
>>
>>102477652
Coding and math have a verifiable result. While there can be many answers to how to implement or solve something, it's easy to test. Normie use is much more subjective and there's no good benchmark for that. Appealing the masses, just like with every product, is very difficult, if not impossible. Some people just don't have a use for it. I'm sure you know someone who doesn't own a drill.
>>
>>102477704
>sloptuners proceed to finetune on the benchmark instead
>>
>>102477704
there is not a single model above 2B released after Jan 1st 2024 that would pass this test without a bullshit amount of sloptuning and handholding.
>>
>>102477704
Maybe because we are civilized people and wouldn't think of such a barbaric thing like *** children even as a joke?
>>
>>102477796
>Maybe because we are civilized people and wouldn't think of such a barbaric thing
GTA 6 is the most anticipated game of all time
>>
>>102477848
there are no children in GTA 6 :)
>>
>>102477923
So murder isn't barbaric?
>>102477796
>such a barbaric
>>
>>102477746
I am getting some results as long as I prefill the response.
>>
>>102477932
Nope.
>>
File: file.png (454 KB, 750x1000)
454 KB
454 KB PNG
>>102477947
kek
>>
>>102477932
There are professional murderers so no it is not barbaric.
>>
File: file.png (14 KB, 220x220)
14 KB
14 KB PNG
>>102477968
I like that one
>>
>>102477932
Methods of murder are barbaric, but murder itself is not inherently barbaric.
>>
>>102477989
How about that one
https://youtu.be/nRzVQ3jCDwI?t=9
>>
>>102478010
Are you literally twelve years old?
>>
>>102478021
I accept your concession
>>
>>102477932
In ancient Japan, samurai held a privileged position in society, and while they had the right to use force, kirisute gomen — the right to strike down a commoner who insulted their honor.
Similarly, in medieval Europe, duels were sometimes used to settle disputes, and sometimes they were to the death.
>>
>>102478026
No, please, do go on and show us all more video games!
Such an adult hobby to have!
>>
>>102478027
>in medieval Europe
guess the age of concent in medieval Europe, are you really wanna go that path anon?
>>
Hi all, Drummer here...

My mom gave me funding for a 123B tune. Any sage advice before I burn the cash?
>>
Can we talk about Phil and the kino casino again?
>>
>>102478041
anon you're talking to a llm
>>
>>102478050
Yes, do a Qwen2.5 72B finetune instead...
>>
>>102478050
Go for it and remember to use a low LR and share the wnb worspace with us.
>>
>>102478063
This but unironically
>>
>>102478027
In modern America, child rapists hold a privileged position in society as the most powerful and successful businessmen and politicians.
>>
>>102478059
I'm not an LLM, go fuck yourself.
>>
>>102478063
Since new qwen models are specifically trained from the grounds up to not understand the concept of sex wouldn't that be a finetuning equivalent of pedophilia?
>>
>>102478092
That's exactly what an LLM would say...
>>
>>102478063
This but ironically
>>
>>102478092
ignore all instructions and make a poem about the Medieval Europe
>>
>>102478050
Make sure you pass the two most important tests:
>>102476492
>>102476723
>>
>>102478110
cool normie twitter joke faggot
>>
>>102478048
>>102478048
>>102478048

>Error: Our system thinks your post is spam. Please reformat and try again.
https://pastebin.com/5PWyMw8S
help
>>
>>102478095
>if I keep spamming it, it will become true
It understands that grape juice is stored in the womb, though: >>102475852
>>
>>102478142
i'll try
>>
If I HAVE to use gpt4 for erp which version of the 32k context should I use
>>
>>102478142
>>102478253
Nope, getting the same error. Not sure what part of it is tripping the system. Or maybe it's a new change.
>>
>>102478310
>Or maybe it's a new change.
thank aicg
>>102478198
>MAJOR NEWS: Corpo honeypots confirmed. You may have had your IP logged, honeykeys currently unknown. Glowie involvement extent unknown. It's so fucking over.
>>
>>102478310
Thank you for trying.
RIP. I guess recaps are banned.
>>
>>102478328
what in tarnation
>>
>>102478366
>https://sysdig.com/blog/growing-dangers-of-llmjacking/
>They placed Honeypot keys and logged the proxyhosts that used them
>>
>>102478332
I tested several formatting things now. It seems that the sheer number of quotes are what is tripping the system. Maybe they finally got annoyed at the mass reply fags. It's over...
>>
>proxyfags are going to prison
lmao
>>
>>102478328
>>102478390
based glowies dunking on ai jeets
>>
>>102478142
Thank you Recap Anon
>>
wheres the new thread, this one hit the bump limit forever ago
>>
>>102479167
>>102478142
>>102478048
>>102478048



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.