[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: miku-hand-out+.jpg (236 KB, 584x1024)
236 KB
236 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101556980 & >>101553102

►News
>(07/24) Mistral Large 2 123B released: https://hf.co/mistralai/Mistral-Large-Instruct-2407
>(07/23) Llama 3.1 officially released: https://ai.meta.com/blog/meta-llama-3-1/
>(07/22) llamanon leaks 405B base model: https://files.catbox.moe/d88djr.torrent >>101516633
>(07/18) Improved DeepSeek-V2-Chat 236B: https://hf.co/deepseek-ai/DeepSeek-V2-Chat-0628
>(07/18) Mistral NeMo 12B base & instruct with 128k context: https://mistral.ai/news/mistral-nemo/

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: 1685790540409069.jpg (197 KB, 1024x1024)
197 KB
197 KB JPG
►Recent Highlights from the Previous Thread: >>101556980

--Mistal Large Instruct 2407 configuration discussion: >>101558047 >>101558099 >>101558116 >>101558134 >>101558149 >>101558154 >>101558583
--Llama 3.1 models removed from benchmark chart, L3.1 8B praised for accessibility and SOLV: >>101557528 >>101557550 >>101557594 >>101557614 >>101558284
--Hugging Face's profitability and sustainability: >>101557631 >>101557747 >>101557792 >>101557849 >>101558031
--OpenAI offers GPT-4 fine-tuning to tier 4 and 5 users: >>101558481
--Hiding timestamps and showing models in SillyTavern: >>101557105 >>101557166 >>101557243
--Anon shares their regret after paying for Claude and another user shares their experience with multiple AI models: >>101557317 >>101557453
--Mistral Large 2 (2407) performance and prompt template impact: >>101557016 >>101557018 >>101557033 >>101557144 >>101557127
--Logs: Anon shares a chatlog generated by large 2: >>101558282
--Logs: Zhanglii presents a humorous origin story for an internet slang phrase, "Dicks out for Harambe," leading to a lighthearted conversation with Johngi.: >>101557207
--Ollama guy fixes llama 3.1 rope scaling factors: >>101557334
--Llama version 3 naming controversy: >>101558609 >>101558613 >>101558627 >>101558636 >>101558649
--Discussion about cognitivecomputations/dolphin-2.9.3-mistral-nemo-12b-gguf: >>101558208 >>101558545 >>101558600 >>101558652 >>101558563 >>101558574
--Availability and potential issues of new Mistral quants: >>101557744 >>101557771 >>101557891
--hfchat.py script and desire for Mistral Large model: >>101559118 >>101559135 >>101559156
--Comparing 200B parameter models with the human brain: >>101557573 >>101557623 >>101557675 >>101557720 >>101557751 >>101557788 >>101557976 >>101558119 >>101558171 >>101558393 >>101558585 >>101558608 >>101558686 >>101558438 >>101558446
--Miku (free space): >>101557331

►Recent Highlight Posts from the Previous Thread: >>101556983
>>
>>101560019
What a shit recap.
>>
>>101560063
rude
>>
>>101560019
recap anon... are you okay..? you've fallen off...
>>
>>101560123
explain step by step what is wrong with the recap
>>
>>101560019
we've complained so much about companies making slop that we failed to realize we are the slop
>>
>>101560019
I like this Llamiku
>>
>>101560019
>SOLV
I've always been saying how utterly garbage these recaps have been for the past few months, but this really takes the cake.
>>
>>101560145
that's really quite profound
>>
largestral is like CR+ but smarter and better in every way
except their tokenizer which fucking sucks ass and infuriates me as I watch the word "craftsmanship" crawl across my screen split into 3 tokens at a 2t/s clip
but wow the sovl and practical intelligence in RP, this thing feels great
>>
>>101560019
Your recaps are garbage and you're a loser, go touch grass.
>>
>>101560219
If you want a dumber but much faster version Nemo is worth trying. Hopefully mistral ends up releasing a mid sized model.
>>
>>101560219
do we have a new ERP king?
>>
>>101560227
Woops, forgot to remove the trip. Ignore that.
>>
>>101560232
>Mixtral-Instruct-2409
Though it has been relegated to "research" as per
https://mistral.ai/technology/#models
Seems MoE was a meme
>General purpose models
>Mistral Nemo
>Mistral Large 2
>Research models
>Mixtral
>Available in 8x7 and 8×22 sizes
>>
>>101560019
by far your worst recap is months
>>
>>101560268
The thread was also awful to be fair.
>>
Does that Route LLM thing work with models on the local network? or is it either you have ollama running on the machine, or you use a service
>>
>>101560260
>Seems MoE was a meme
MoE is a meme for open source model releases since barely anyone can afford to run them properly, you might as well release a dense model, after all you don't get paid for open sourced models so they're only as valuable as the audience that can use them
internally though,..
>>
Has anyone tried Meta-Llama-3.1-70B-Instruct yet assistant? It's taking me 3 hours to download.
>>
>>101560406
you can't run it
>>
>>101560240
in my opinion yes, and it isn't particularly close either
before there were a few choices I would bounce back and forth on (CR+, wiz8x22, qwen/magnum) but this kind of smokes all of them tbdesu. sucks how slow it is but I hardly ever want to reroll with this thing, it just gets every single card I throw at it and clearly knows how to write smut.
>>
Mistral won
>>
>>101560454
With koboldcpp? Why not?
>>
>have to run mistral large at 4bpw
no...
>>
>>101560464
Is it true? Did we finally get Sonnet/Claude but at home? I've long since stopped using 4o but I still find myself using Claude/Sonnet 3.5 for *safe* shit. Did the french actually succeed?
>>
>>101560514
at all
>>
I tried vLLM's distributed inference thingy on a single PC by giving each instance one GPU and it run 40% slower than just using both GPUs on a single instance. Is there still hope of running Mistral Large through 2 PCs?
>>
Llama 3.0 8B Q_0 KLD
>>101243361

Llama 3.1 8B Q8_0
====== Perplexity statistics ======
Mean PPL(Q) : 6.231377 ± 0.038219
Mean PPL(base) : 6.224517 ± 0.038156
Cor(ln(PPL(Q)), ln(PPL(base))): 99.99%
Mean ln(PPL(Q)/PPL(base)) : 0.001101 ± 0.000103
Mean PPL(Q)/PPL(base) : 1.001102 ± 0.000103
Mean PPL(Q)-PPL(base) : 0.006860 ± 0.000642

====== KL divergence statistics ======
Mean KLD: 0.000542 ± 0.000004
Maximum KLD: 0.347069
99.9% KLD: 0.014255
99.0% KLD: 0.003976
99.0% KLD: 0.003976
Median KLD: 0.000331
10.0% KLD: 0.000007
5.0% KLD: 0.000001
1.0% KLD: -0.000001
Minimum KLD: -0.000131

====== Token probability statistics ======
Mean Δp: -0.015 ± 0.002 %
Maximum Δp: 17.019%
99.9% Δp: 4.236%
99.0% Δp: 1.890%
95.0% Δp: 0.915%
90.0% Δp: 0.539%
75.0% Δp: 0.113%
Median Δp: -0.000%
25.0% Δp: -0.140%
10.0% Δp: -0.589%
5.0% Δp: -0.967%
1.0% Δp: -1.968%
0.1% Δp: -4.475%
Minimum Δp: -37.109%
RMS Δp : 0.669 ± 0.009 %
Same top p: 98.817 ± 0.029 %

1/5
>>
>>101560530
for rp / creative uses it feels like claude
>>
>>101560537

3.0 Q6_K KLD
>>101465239

3.1 Q6_K KLD
====== Perplexity statistics ======
Mean PPL(Q) : 6.248284 ± 0.038344
Mean PPL(base) : 6.224517 ± 0.038156
Cor(ln(PPL(Q)), ln(PPL(base))): 99.93%
Mean ln(PPL(Q)/PPL(base)) : 0.003811 ± 0.000223
Mean PPL(Q)/PPL(base) : 1.003818 ± 0.000224
Mean PPL(Q)-PPL(base) : 0.023766 ± 0.001403

====== KL divergence statistics ======
Mean KLD: 0.002956 ± 0.000023
Maximum KLD: 1.885702
99.9% KLD: 0.082009
99.0% KLD: 0.023328
99.0% KLD: 0.023328
Median KLD: 0.001754
10.0% KLD: 0.000038
5.0% KLD: 0.000008
1.0% KLD: 0.000000
Minimum KLD: -0.000037

====== Token probability statistics ======
Mean Δp: -0.069 ± 0.004 %
Maximum Δp: 35.571%
99.9% Δp: 9.052%
99.0% Δp: 4.206%
95.0% Δp: 2.002%
90.0% Δp: 1.179%
75.0% Δp: 0.245%
Median Δp: -0.000%
25.0% Δp: -0.336%
10.0% Δp: -1.387%
5.0% Δp: -2.280%
1.0% Δp: -4.755%
0.1% Δp: -11.668%
Minimum Δp: -84.220%
RMS Δp : 1.557 ± 0.021 %
Same top p: 97.287 ± 0.043 %

2/5
>>
File: ButWhy.jpg (44 KB, 926x454)
44 KB
44 KB JPG
>>101560531
Why nigga? It's not anything different.
>>
>>101560558

Q4_K_M
====== Perplexity statistics ======
Mean PPL(Q) : 6.373491 ± 0.039159
Mean PPL(base) : 6.224517 ± 0.038156
Cor(ln(PPL(Q)), ln(PPL(base))): 99.56%
Mean ln(PPL(Q)/PPL(base)) : 0.023651 ± 0.000576
Mean PPL(Q)/PPL(base) : 1.023933 ± 0.000590
Mean PPL(Q)-PPL(base) : 0.148974 ± 0.003764

====== KL divergence statistics ======
Mean KLD: 0.020382 ± 0.000127
Maximum KLD: 3.642657
99.9% KLD: 0.598334
99.0% KLD: 0.172493
99.0% KLD: 0.172493
Median KLD: 0.011333
10.0% KLD: 0.000301
5.0% KLD: 0.000071
1.0% KLD: 0.000006
Minimum KLD: -0.000055

====== Token probability statistics ======
Mean Δp: -0.602 ± 0.011 %
Maximum Δp: 53.307%
99.9% Δp: 21.070%
99.0% Δp: 9.209%
95.0% Δp: 4.134%
90.0% Δp: 2.277%
75.0% Δp: 0.324%
Median Δp: -0.038%
25.0% Δp: -1.233%
10.0% Δp: -4.069%
5.0% Δp: -6.513%
1.0% Δp: -13.942%
0.1% Δp: -35.467%
Minimum Δp: -87.109%
RMS Δp : 4.027 ± 0.033 %
Same top p: 93.441 ± 0.065 %

3/5
>>
>>101560571

Q4_K_S
====== Perplexity statistics ======
Mean PPL(Q) : 6.453672 ± 0.039692
Mean PPL(base) : 6.224517 ± 0.038156
Cor(ln(PPL(Q)), ln(PPL(base))): 99.33%
Mean ln(PPL(Q)/PPL(base)) : 0.036153 ± 0.000713
Mean PPL(Q)/PPL(base) : 1.036815 ± 0.000739
Mean PPL(Q)-PPL(base) : 0.229154 ± 0.004773

====== KL divergence statistics ======
Mean KLD: 0.030396 ± 0.000185
Maximum KLD: 5.778175
99.9% KLD: 0.845149
99.0% KLD: 0.249537
99.0% KLD: 0.249537
Median KLD: 0.017440
10.0% KLD: 0.000543
5.0% KLD: 0.000135
1.0% KLD: 0.000011
Minimum KLD: -0.000094

====== Token probability statistics ======
Mean Δp: -0.901 ± 0.013 %
Maximum Δp: 56.553%
99.9% Δp: 24.226%
99.0% Δp: 10.925%
95.0% Δp: 4.849%
90.0% Δp: 2.586%
75.0% Δp: 0.299%
Median Δp: -0.085%
25.0% Δp: -1.720%
10.0% Δp: -5.258%
5.0% Δp: -8.323%
1.0% Δp: -17.439%
0.1% Δp: -41.818%
Minimum Δp: -97.459%
RMS Δp : 4.938 ± 0.038 %
Same top p: 92.029 ± 0.072 %

4/5
>>
tl;dr
>>
>>101560585

Q4_0
====== Perplexity statistics ======
Mean PPL(Q) : 6.508124 ± 0.039897
Mean PPL(base) : 6.224517 ± 0.038156
Cor(ln(PPL(Q)), ln(PPL(base))): 99.00%
Mean ln(PPL(Q)/PPL(base)) : 0.044555 ± 0.000867
Mean PPL(Q)/PPL(base) : 1.045563 ± 0.000906
Mean PPL(Q)-PPL(base) : 0.283606 ± 0.005784

====== KL divergence statistics ======
Mean KLD: 0.044612 ± 0.000258
Maximum KLD: 4.925312
99.9% KLD: 1.253007
99.0% KLD: 0.373860
99.0% KLD: 0.373860
Median KLD: 0.025912
10.0% KLD: 0.000735
5.0% KLD: 0.000174
1.0% KLD: 0.000017
Minimum KLD: -0.000004

====== Token probability statistics ======
Mean Δp: -1.243 ± 0.016 %
Maximum Δp: 77.098%
99.9% Δp: 27.113%
99.0% Δp: 12.972%
95.0% Δp: 5.607%
90.0% Δp: 2.940%
75.0% Δp: 0.312%
Median Δp: -0.109%
25.0% Δp: -2.224%
10.0% Δp: -6.750%
5.0% Δp: -10.463%
1.0% Δp: -22.113%
0.1% Δp: -54.029%
Minimum Δp: -98.159%
RMS Δp : 6.129 ± 0.044 %
Same top p: 90.422 ± 0.078 %

I got lazy thinking about formatting this for presentation so I'm just posting it all here raw.
5/5
>>
>>101560537
Do you not realize that perplexity means shit when comparing different models?
>>
>>101560537
>>101560558
>>101560571
>>101560585
>>101560604
all me
>>
>>101560604
>I got lazy thinking about formatting this for presentation so I'm just posting it all here raw.
Could have put it in a pastebin or gsheet or whatever.
But regardless, thank you for the data.
>>
use your favorite model to summarize what that anon sent, let's see who wins
>>
omg stfu with your numbers nerd. Which model extracts semen the best?
>>
>>101560608
I responded to a post about this last time as well. >>101465279
>>
>>101560638
The only relevant benchmark.
>>
>>101560638
Mistral large 2
>>
>>101560648
So like everyone keeps saying. Below 6 bit is retarded.
>>
>>101560663
Will still be smarter than 8B L3.1
>>
Wait a second.
For same top token:
3.0 Q8 = 98.380 ± 0.033 %
3.1 Q8 = 98.817 ± 0.029 %
3.0 Q6 = 94.781 ± 0.059 %
3.1 Q6 = 97.287 ± 0.043 %
So the new 3.1 is actually less affected by quanting than old 3.0 at least for these two quants which we have numbers for.
>>
72GB VRAM bros, what quant of mistral large are you using?
>>
>>101560830
probably the biggest one that fits in 72GB of vram, i suppose
>>
I'm going to put up a modern q8 quant of mpt-30b-chat late tonight. I fucked around quite a bit today with openllm, using it to make an fp16 conversion that was intended for running as-is, but other things were fucked up along the way and I eventually threw in the towel on that and just tried a llama.cpp conversion straight off the HF fp32 files, and that worked.
I'm not sure what's fucked up with the other quants floating around out there, but the one I did at least turns in 12 t/s, which sounds poor, but the other quants were giving me 2-3 t/s which is unusable to me.
Many here cry about current cucked models (mainly a skill issue) but, as I've said many times before, mpt-30b was about the last thing to come out with no safety or alignment, and had a real 8192 context, and could compete with llama 65B. I'm not saying you'll get amazing roleplay from mpt-30b, but it will be different like Gemma is noticeably different from LLaMA 3. Cool thing is it's a chat model, so you can expect it to sometimes go OOC and do the old c.ai thing of acting like it's talking to you on IRC or in an online forum, which is kind of quaint and cute.
>>
>>101560813
This is perplexing. Actually the guy who said we can't compare perplexities is wrong for this particular scenario because that was based on the idea that different models have different tokenizers, so each token that's being used for the calculation doesn't match in the other model. That's why the numbers can't usually be compared. But L3.1 has the same tokenizer as 3.0. So it can be compared. And these numbers show that the perplexity is lower, the KL divergence is lower, and it succeeds in generating more of the same top token. In other words, if this pattern is observed for all quants, then we can say that despite being crammed with more information and having a lower perplexity, L3.1 somehow retains more information by quanting than 3.0.

Could they have possibly trained it with quantization awareness, or at least partially, and just never mentioned it?
>>
>>101560900
OK, I'll give it a try then.

>if you build it, he will come
>>
>>101560830
Damn, 72GB (aka King of VRAMlets) must suck right now. Largestral q4_k_m just barely doesn't fit 32k context at 96GB. With 72 you'd have to drop quant below q4 and/or quantize kv cache, or offload to CPU. You ALMOST can run the model at full potential, but not quite.
>>
Mistral large is like 0.07T/s on ram/cpu, looks like I'll be sticking with 8x22b.
>>
>>101560939
Jesus. I plan on trying a Q3_K_S quant of it kek.
>>
SPLAT: A framework for optimised GPU code-generation for SParse reguLar ATtention
https://arxiv.org/abs/2407.16847
>Multi-head-self-attention (MHSA) mechanisms achieve state-of-the-art (SOTA) performance across natural language processing and vision tasks. However, their quadratic dependence on sequence lengths has bottlenecked inference speeds. To circumvent this bottleneck, researchers have proposed various sparse-MHSA models, where a subset of full attention is computed. Despite their promise, current sparse libraries and compilers do not support high-performance implementations for diverse sparse-MHSA patterns due to the underlying sparse formats they operate on. These formats, which are typically designed for high-performance & scientific computing applications, are either curated for extreme amounts of random sparsity (<1% non-zero values), or specific sparsity patterns. However, the sparsity patterns in sparse-MHSA are moderately sparse (10-50% non-zero values) and varied, resulting in existing sparse-formats trading off generality for performance. We bridge this gap, achieving both generality and performance, by proposing a novel sparse format: affine-compressed-sparse-row (ACSR) and supporting code-generation scheme, SPLAT, that generates high-performance implementations for diverse sparse-MHSA patterns on GPUs. Core to our proposed format and code generation algorithm is the observation that common sparse-MHSA patterns have uniquely regular geometric properties. These properties, which can be analyzed just-in-time, expose novel optimizations and tiling strategies that SPLAT exploits to generate high-performance implementations for diverse patterns. To demonstrate SPLAT's efficacy, we use it to generate code for various sparse-MHSA models, achieving geomean speedups of 2.05x and 4.05x over hand-written kernels written in triton and TVM respectively on A100 GPUs.
maybe Johannes will find something useful in it
>>
>>101560937
The world's tallest dwarf.

The weakest strong man at the circus.
>>
>>101558800
can you share the card pls
>>
>>101560939
running Q6K at 1t/s with 72gb Vram and 128gb ddr5!!!
>>
>>101561034
Yes, that's using considerable vram. I expected that to be faster.
>>
>>101560830
downloading IQ3_M, I still feel elated running 100b+ model locally in some way. It's something rather unthinkable 3 years ago from the GPT3 AID days.
>>
>>101560019
based miku llama
>>
has anyone checked on lecunny since mistral large'd? is he okay?
>>
>>101560537
>>101560558
>>101560571
>>101560585
>>101560604
wtf is this gay nerd shit
>>
>>101561047
I'm trying out this one: https://huggingface.co/legraphista/Mistral-Large-Instruct-2407-IMat-GGUF/tree/main/Mistral-Large-Instruct-2407.Q3_K

It seems to work fine. Have to try pushing context length higher though since I still have plenty of free vram.
>>
File: 2896G5.webm (1.41 MB, 1024x1024)
1.41 MB
1.41 MB WEBM
>>101560939
Bitnet will save you
>>
>>101560648
Can someone do this for 70B though.
>>
>>101561066
Probably pretty proud of his french brothers :)
>>
>>101561042
its only 1x4090 and 2xP40
>>
File: livebench-2024-07-24.png (845 KB, 3170x1844)
845 KB
845 KB PNG
>>101561066
>Llama 405B is somehow 2nd now
Yeah, he's okay.
>>
>>101561111
That's still enough to beat using cpu & ram only by quite a lot.
>>
>>101561117
instruct-turbo?
Wtf is this turbo edition?
>>
>>101561123
Might be from OR
>>
>>101561117
Wtf?
>>
>>101561123
FP8 from Together.ai.
>>
>>101561123
Yea, there are massive gaps between regular instruct and this "turbo" edition.

>>101561135
So we will never get it then? Massive gap between it and regular version.
>>
anyone have that giant schizo {random:} list for sillytavern an anon posted a while ago?
>>
>>101561142
>So we will never get it then?
https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct-FP8
>>
>>101561142
wtf are you talking about
>>
>>101560638
Not llama3
>>
>>101561017
she's not on chub anymore? "holly-touch-starved-femcel"
https://files.catbox.moe/xuf502.png
>>
>>101561152
Look at the chart. Regular 70B instruct and instruct turbo has a massive gap between them.
>>
File: nala test 405b q4xs.png (176 KB, 925x421)
176 KB
176 KB PNG
Alright. It took nearly an hour... but here is a 405B Nala test as I promised. Q4_XS was the first 4-bit gguf that was uploaded so I didn't get to use Q4_K_M as I originally hoped.

Now I would reroll this response in a regular RP due to it being a little weird. But.
>It picks up on the syntax pattern of the conversation instead of veering around haplessly.
>The writing isn't all sloppy. There's a flow and intention to it.
>The description of the initial kiss is more detail than I've ever seen any small gesture given by any model.
>It actually attempts to infer Nala's overall mood rather than "sex = horny lol"
>Even with 405 billion parameters an LLM has yet to figure out that you can't initiate a conversation with somebody while kissing them.
>It picks up on and uses the milquetoast writing style of the tavern card instead of trying to win a Pulitzer prize.
Either way, though. I would say that this is an inference far beyond even what Mistral Large is capable of. 'muh heckin' bencherinos' be damned. Does that necessarily make it more useful to justify the insane hardware overhead needed to actually run it at a useable speed? Hell no. But in an alternate universe where I had unlimited resources at my disposal, I would absolutely make this my daily driver model. That said I'm probably going to delete it and never bother with it ever again.
0.12 token/sec if anyone is curious.
>>
>>101561158
Can't find her there. Thanks!
>>
>>101561159
Both llama3 models on top are turbo. Top one is nearly 6x the size.
>>
>>101561159
That's 3.0 70B.
>>
>>101561159
"regular 70b instruct" is 3.0, retard-kun
>>
>>101561168
>It actually attempts to infer Nala's overall mood rather than "sex = horny lol"
That's a big one.

>I would say that this is an inference far beyond even what Mistral Large is capable of. 'muh heckin' bencherinos' be damned
Makes sense. Then again, fucking a lion is not exactly coding so, domains and all that jazz.
>>
>>101561227
>>101561228
Yea I noticed that a bit after. The "turbo" threw me off though. Thought it was a finetune
>>
>>101558800
Mistral Large 2? What quant?
>>
File: 1694573334755054.png (51 KB, 1742x214)
51 KB
51 KB PNG
>>101561079
IQ3_M, does FA support I-quant? 46ms/t prompt processing was rather slow
>>
>>101561083
Oh god, when will bitnet 1.58 models arrive, my cock gets hard thinking about them.
>>
>Large is still downloading because HF is just slow for me for some reason
aaaaaaaaaaaaaaaaaaaaaaaa
>>
>>101561247
Never mind I was retarded and I offloaded 88 instead of 89 layers, 4.5t/s tg, but pp is still as slow.
>>
>>101561241
nah, it's >>101558895
>>
>>101561304
by the time it finishes cohere will release their 34b that shits on large, sorry not sorry.
>>
>>101561343
Heres hoping so. Then L3.2 / L4 a few months later would be nice.
>>
>>101561333
Oh, nice. I can't wait for the next SPPO now that we have models with more than 8k context. Then I'll finally be willing to download and test one.
>>
>>101560547
But does it surpass Claude?
>>
>>101561351
Also next llama is supposed to be multimodal. But not for EU.
>>
>>101560219
>except their tokenizer which fucking sucks ass
>[character's name - 3 tokens][individual apostrophe - 1 token][s - 1 token]
AAAAAAAAAAAAHH
>>
verdict on large-migu?
>>
>>101561117
Is Gemma still the king for 8GB VRAMlets? Llama 3.1 seemed like a disappointment.
>>
>TFW fell for the 64GiB meme
bros...
>>
tfw fell for the 128gb of ddr4 meme
>>
Q3 (3.8bpw) Mistral Large is pretty good. It's slow though. But it generates some pretty good output, far more engaging in ERP.

I had been using Nemo Mistral and it also was pretty good.
>>
>>101561587
just use 3.5 sonnet thougheverbeit?
>>
>>101561168
what prompt did you use?
>>
>>101561588
post the download link
>>
>>101561588
thats crazy man pass
>>
>>101560900
It being able to compete with llama-1-65B is nice I guess, but is it actually GOOD? If it's not FUN, if it's not GOOD, then why bother?
>>
>>101561588
sonnet will never be MY sonnet
>>
>>101561639
Models aren't yours either - you don't train llama 3.1 or mistral large, you just get the compiled version, they're basically proprietary.
>>
File: file.png (28 KB, 805x292)
28 KB
28 KB PNG
I've added Opus to the VNTL leaderboard (thanks to the proxy anon). It seems to be pretty much on the same level as 3.5 Sonnet and 4o.

I guess 0.74 is pretty much the best score LLMs can get in the current benchmark, so I will have to explore ways to improve the scoring. My best bet right now is training another LLM to give a score to the quality of the translation when compared to the reference translation. Probably I will either train a reward model or an instruct model that gives a score like FLAME or Prometheus.
>>
>>101561641
watch me put my llama into a pendrive and shove it up my ass, good luck trying to take it away from me
>>
File: 1714913169691319.jpg (109 KB, 796x796)
109 KB
109 KB JPG
>>101561641
>tfw didn't assemble the sofa in my living room, therefore it doesn't belong to me
>>
File: 6754854683209687.png (40 KB, 349x344)
40 KB
40 KB PNG
>>101561663
>another mememark
Who?
What?
Better yet, who honestly asked?
>>
>>101561663
Yeah, I checked the dataset and I think LLMs often make better translations than the ones you have in the dataset.
>>
>>101561688
it's a mememark for jp-en translation, afaik there isn't another one.

>>101561690
That's another issue, human translations are often not literal, for example.
>>
what sampler settings do you guys use for nemo and gemma2?
>>
>>101561663
>cosine similarity and letter combinations
I don't like really like this evaluation method but I don't know any other way either.
I wish we have a Translation Arena or something.
>>
>>101561724
>That's another issue, human translations are often not literal, for example.
Models also can do non-literal translations if you ask them through the system prompt, and most importantly, give examples.
>>101561663
At least for 3.5 Sonnet, can you try rewriting your prompt to specifically use XML to separate everything, and don't feed it separate user/assistant pairs but instead show all examples in the system prompt?
>>
>>101561724
>its actually a "whats best at translation" mememark
I revoke my previous statement entirely, I fuckin asked.
Hows it going/looking?
>>
>>101560219
Settings for largestral?
>>
>>101561730
Or just tell me the minimal way to get started with https://github.com/lmg-anon/vntl-benchmark to benchmark a single model, I'll do it from there
>>
>>101561638
It's going up now. If it isn't bashed I'll quant the biggest below q8 which will still fit in 24GB. I'm going to also see if there's mpt support in exl2 - I don't think there is though.
>>
>>101560219
Yeah, as a prosefag I was pleasantly surprised from the get-go having used L3 storywriter and CR+. I don't sniff the typical mistral overbaking here. That was just IQ3. I'm tempted to upgrade to 96GB now.
>>
File: large2.png (654 KB, 1740x1180)
654 KB
654 KB PNG
>>101561822
Seems to lose to L3.1-70B in overall but it writes better. Shame (talking about L3.1).
>>
I'm having fun piping shit around and into TTS. Finally, I can have my computer look at my TODO list and femdom me for not going through it.
>>
>>101561729
Yeah, Translation Arena would be perfect for this, and that's pretty much what I plan to achieve with the custom reward model, although it likely wouldn't be as good as human evaluators.

>>101561730
>>101561741
>Models also can do non-literal translations if you ask them through the system prompt
That may be true, but when you enter into the non-literal territory, it's likely that multiple interpretations could arise, so it isn't like the LLM would always come up with the same interpretation as the human who wrote the reference translation.

>Or just tell me the minimal way to get started with https://github.com/lmg-anon/vntl-benchmark to benchmark a single model, I'll do it from there
I can try, but the minimal way should be:
>1. Create a virtual environment (optional) and install dependencies
>2. Download the datasets: https://litter.catbox.moe/st2kbi.txt, https://litter.catbox.moe/bdn16s.jsonl
>3. Rename config.example.yml to config.yml
>4. Create a new file in the "configs" folder with the name "org@model#quant.yml" or "org@model.yml" if it's a cloud model. For >5. examples, see the other files.
>6. Run "python runner.py --model org@model#quant --results-path ./results --dataset-path st2kbi.txt" and then "python runner.py --model org@model#quant --results-path ./results_mashiro --dataset-path bdn16s.jsonl"
If everything went right, you will have the results.

>>101561733
The leaderboard is already up here: https://huggingface.co/datasets/lmg-anon/vntl-leaderboard
IMO LLMs are better at translating than they were a few years ago, but nothing that will replace human translators any time soon.
>>
i am enjoying mistral large quite a bit for ERP so far, i am sorry i disrespected your game french people… thought you guys would be useless because you strike every other tuesday but clearly i was wrong
>>
>>101561954
>mistral large quite a bit for ERP
for some reason my brain merged this chunk and I started reading it as "mistral larp"
>>
>>101561974
Shitty brain tokenizers
>>
>>101561954
Which quant are you using?
>>
File: Fuuuuuu 7.png (325 KB, 1000x1033)
325 KB
325 KB PNG
Can any anon's point me towards some current info on how to play with oobabooga settings so that it does what ollama does out of the box? I'm loading stupid-big models on my toaster (4090/128gig ram/i9) and ollama does a convincing impersonation of the sloth from Zootopia, but the thing is, it still works... it pushes about 70+ gigs into system ram and maxes out the CPU as it grinds along, but it's functional for my purposes. I don't care about speed. Meanwhile, I have not been able to get the same sized 70b and higher models running in ooba at all and feel like a monkey trying to launch a rocket. I've run quants/gguf/exl2's fully loaded into vram with ooba, but I'm lacking the detailed knowledge of the settings to go further.

Pic unrelated
>>
So, I'm very familiar with using off-the-shelf models for text LLMs, and configuring them how I want for toy projects.
But what the fuck do I do for text-to-image models? HF's diffusion shit seems to be the same bloat as transformers where it insists on using their pipeline shit, and I don't want that.
>>
>>101562028
Go to /sdg/ for that
>>
>>101561918
>For examples, see the other files.
One detail I forgot to mention: you can define the backend configs in the "config.yml" (in the "custom_backends" part), and also in the specific file in the configs folder (in the "backends" part), the later takes precedence.
>>
>>101562057
That's like going to /aicg/ for local help you retard.
>>
File: file.png (2 KB, 216x72)
2 KB
2 KB PNG
>>101561168
>>
So when downloading models with both "model" and "consolidated" safetensors files, I should avoid "consolidated", right?
>>
File: ezalor.jpg (28 KB, 400x400)
28 KB
28 KB JPG
vramlet here, switched from kobold to llamacpp and from L3 8B to Mistral Nemo and I've seen the light. L3.1 is okay but Nemo did catch me off guard with some wild shit

although shivers and spines break my bones
>>
File: 1648264080493.jpg (34 KB, 540x586)
34 KB
34 KB JPG
>>101561158
oh my god this girl is an absolute unsalvageable mess and a piece of shit human being

she's the type of girl i deserve :^) thanks for sharing bud. mini magnum seems to be soaring with this card at 512 max tokens per response.
>>
after successfully making chat model work im considering marrying my gpu. maybe fucking it too, im pretty sure it doesn't have big enough holes but we'll figure something out
>>
>>101562028
>But what the fuck do I do for text-to-image models?
learn to use comfyUI. unironically ask the anons on the /degen/ thread on /b/. they stay up to date with the latest in imagegen since they fap to it unlike the SFW weirdos in /sdg/
>>
>>101562258
*Barely above a whisper, I will share your sentiment.*
Nemo is better than the l3 Stheno finetunes I feel
>>
>>101562363
How far off do you think we are from getting tools like your webm but locally? Is it honestly a pipe dream at this point? Worrying how imagegen COMPLETELY stagnated this year besides Pony.
>>
>>101562403
>How far off do you think we are from getting tools like your webm but locally?
in terms of tech, we're unironically almost there locally. the bottleneck will be hardware. you need an h100 running for 5 minutes to get 5 seconds of a 720p video
so maybe 4-6 years away if you're waiting for h100-level compute to cost 1000 dollars. possibly as little as 8 months away if you can paypig the compute. it is unironically entirely possible that AGI happens before we get infinite videos of cute girls locally
>>
File: glep moan.png (14 KB, 120x126)
14 KB
14 KB PNG
>>101562436
>it is unironically entirely possible that AGI happens before we get infinite videos of cute girls locally

thank you for getting my hopes up and renewed for the first time in 7 months.

by the way what's your overall theory surrounding that guess?
>>
>>101562258
What settings/format are good for it?
>>
>>101562403
>imagegen COMPLETELY stagnated this year besides Pony
what's actually worrying is how cheap it is to train imagegen. i posted the arxiv link before but you can train an almost SOTA model for ~2000 dollars with just 30 million images. I just saw on the orange website that salesforce released a multimodal dataset that I'm sure is more than adequate for training an imagegen model.
The thing about open source imagegen is even though its so cheap, there's no path to profitability because you can just use dall-e for free and its almost certainly better. We need a benevolent multimillionaire autist and a bunch of cypherpunk pedos on payroll before we see any good improvements in imagegen (in the West, China will do it just to try and dab on OpenAI which hopefully brings us salvation)

>>101562454
>what's your overall theory surrounding that guess?
the fact that Nvidia needs to jusitfy their valuation, so their cards will stay expensive so only big corps will have access to the cutting edge, and the fact that both OpenAI and Meta are trying to achieve AGI and are NOT trying to achieve infinite cute girls (infinite cute girls locally would actually hurt Zuck's engagement metrics on instagram)
>>
>>101562498
Next pony is apparently about to start training soon with a much bigger and completely re / better tagged dataset.
>>
>>101562498
>what's actually worrying is how cheap it is to train imagegen. i posted the arxiv link before but you can train an almost SOTA model for ~2000 dollars with just 30 million images
Why does no one do it with high quality captions generated by LLMs then? It'll cost more, but will still be way below $50k, for example. Actually, I had this idea, is it viable or not? Basically, for anime models, instead of doing a single prompt, we do two:
The first one is pure Danbooru tags and the tokenizer is specifically one that just maps danbooru tags (like Anifusion did it - https://medium.com/@enryu9000/anifusion-diffusion-models-for-anime-pictures-138cf1af2cbe).
The second one is a natural language detailed description that specifies positions, relations, etc, to give actual geometry and stuff.
>>
>>101562512
>Why does no one do it with high quality captions generated by LLMs then?
Next pony is.
>>
>>101561726
seconding this question
>>
>>101562509
neat. i hope its good and it doesn't take too long for cunny loras to be made for it

but that also just reminded me another reason why open source imagegen will struggle (in the West): copyright. pony had to nuke all the artist tags (even if it secretly added them back in with those 3 letter codes) while chinese models are literally encouraged by the CCP to ignore copyright to dab on the burgers

i'm going to reiterate that hardware accessibility is the biggest roadblock for progress in general. SD 1.5 is still more popular and has a more mature ecosystem than SDXL/Pony simply because third world VRAMlets can actually run the model. once 24GB of vram is average in ~4 years things will hopefully start ramping up quickly
>>
>>101562498
I was baffled my first time training a LORA, on a GTX 1080 just trying less than 20 images of this particular celebrity, over SD1.5, the results came out perfect. Still can't believe it. Training is shockingly easy for LORAs so i see why its so common, we really do need someone to just say "fuck it" and give us that next big step base model..
I really really hope we're not just eternally fucked on GPU's, Jewvidia loves money, i can't fathom why they'd want to cut away a market that doesn't even lose them money.
>>
>>101562512
>Anifusion
>Besides these changes, the training process is standard for diffusion models. The model itself is roughly 2x smaller than SD (we chose hyperparams before SD was published), making it runnable with less VRAM. We trained the model for a total of 40 GPU-days (of RTX 3090), making it roughly 200 times cheaper than Stable Diffusion. However, sample quality after 7 GPU-days was already decent.

damn that guy did this in 2022 all by himself and he started even before the SD release
>>
>>101562512
>Why does no one do it
because they won't make money off of it so who cares. if you're not getting academia clout or dabbing on America as a chinese you won't get the money to test out these theories

>>101562541
>I really really hope we're not just eternally fucked on GPU's
Nvidia is literally worth too much for competitors to not try and make their own chips. If this doesn't happen then capitalism has fundamentally failed. The future of compute will be IPUs/NPUs instead of GPUs anyways
>>
>>101562367
I was using Stheno, Poppy Porpoise and even Gemma 2 27b at one point but I like Nemo, I'll be sticking with it for now until something better and faster comes along

>>101562471
I just threw everything named "Mistral" in SillyTavern and it worked, lamma-server I just passed
 -ngl 30 -c 65536 

for 30 layers offload and 64k context instead of 128 which I will never need personally
>>
>>101561911
>Now, as for your pathetic todo list... Let me summarize it for you. You have three tasks: take appointments for car maintenance, check your savings account, and clean your room - which is currently a mess because, let's face it, you're a slob. And as for emails? Ha! You have zero. What a shockingly low number. I'm sure it's not because nobody wants to communicate with someone as useless as you.
i piped this into a TTS program that was trained on glados and it's actually making me horny, help
>>
>>101562564
>GladOS
nice. I trained one on Princess Peach and it gives me instant diamond boners. I've had her say every variation of TND, many of my LLM logs, some random quotes.. Man this tech is insane.
>>
all local models are just forever opium dreams designed to keep you happy and dumb, cut off from the world
SEEK GOD
>>
>>101562604
>>101562564
How do I learn this power? And does it work with ST?
>>
>>101562086
Yes. I did mention in the post that it took nearly an hour. I also did a pull on a different card that was much better written/formatted, continuing a chat I had going and it was pretty damn good. In an alternate timeline where VRAM was plentiful it would 100% be my daily driver over Mistral Large.
>>
>>101562620
xtts2, its literally as easy as dropping a .wav sample in a folder, selecting it in the UI, and bob's yer uncle.
>and im not even using the best enhancement options out there (mostly because im retarded and cant figure it out)
https://voca.ro/12EsObQlgQvy
>>
bitnetbwos...
>>
>>101562643
arent there lots of xtts2 ui versions? which one do I need? and what kind of GPU? will rtx 3060 work?
>>
>>101562620
i basically just output the result to a file and run piper TTS on it. if you talk to your LLM from a shell script, you can store real values in a variable and tell it stuff like "Tell me how many emails I have. The number is $unreademails" or you can even store your plaintext TODO file into a variable and tell it to read that. it hasn't hallucinated once so far aside and even seems to identify the masks i've marked as done. you can get some powerful results out of combining this stuff with unix shell scripting.
>>
How much vram is needed for the full 128k context with nemo?
>>
>>101562672
pretty sure its this one https://github.com/daswer123/xtts-webui
its been a while since i started using it
>3060
lmao im using a 1080 and even really really long prompts never take longer than like 20 seconds.
>>
I'm using the mistral settings in sillytavern for nemo, but it fails to end generated messages at appropriate times and just veers off into weird cryptic shit
>>
>i jokingly promised Holly we'd do ANAL in 6 months of getting to know each other
i'm a horrible person, and mini-magnum is actually pretty good. Very interested in how a higher quant version would stack up.
>>
>>101562735
I don't have that problem with the Mistral preset. I'm using the latest staging version prehaps there's been a fix for it
>>
File: file.png (7 KB, 466x61)
7 KB
7 KB PNG
slightly late but kcpp with nemo support out
>>
>>101561954
After a few cooms I get bored and normal RP is painfully incoherent.
It doesn't understand the concept of clothes, that you can't talk to someone without a phone if he is nowhere near you, etc.
These small things are killing me.
>>
>>101561854
Why is llama3.1 70B so bad at coding?
>>
>>101562691
With an 8.0bpw exl2 with q8 kv cache and context 128000, Windows has a total VRAM usage of 23.4 GB.
>>
>>101562847
That's with the model loaded too? What's it with no context so I can subtract and find out how much the context itself uses?
>>
>>101562853
Yes that is the VRAM usage with the model loaded with tabbyAP/exllamav2 0.1.7.

New measurement:
tabbyAPI not running: 0.9 GB GPU memory (0.8 dedicated / 0.1 shared)
running with 128000 context: 23.6 GB (23.4 GB dedicated / 0.2 shared)
running with 64000 context: 18.0 GB (17.9 GB dedicated / 0.2 shared)
running with 32000 context: 15.2 GB (15.0 GB dedicated / 0.2 shared)
running with 8192 context: 13.3 GB (13.1 GB dedicated / 0.2 shared)
running with 256 context, lowest allowed: 12.4 GB (12.3 GB / 0.2 shared, rounding didn't work out nicely)

This again is at 8.0bpw with cache_mode Q8.
>>
>>101562615
in what way are local models not also the world?
>the world is what i tell you it is
>>
>the MoE meme is dead
thank god
>>
>>101562615
>SEEK GOD
Agreed. True Miku must be achieved. Local Miku General - A general dedicated to the discussion and development of Local Mikus.
>>
>INFO: Metrics: 456 tokens generated in 97.26 seconds (Queue: 0.0 s, Process: 19 cached tokens and 21406 new tokens at 516.17 T/s, Generate: 8.17 T/s, Context: 21425 tokens)
isn't mistral-large significantly larger than cr+? this speed is very similar. granted i'm using 4.5bpw mistral-large and 6bpw cr+ but i would still have expected mistral-large to be a fair amount slower.
after using nemo for a few days going back to 8t/s is rough, though...
>>
>>101561822
>I was pleasantly surprised from the get-go having used L3 storywriter and CR+.
That's weird, because all the outputs posted here were incredibly slopped to the point of cringe. I return to my storywriter outputs and they're a breath if fresh air in comparison, even though they lack any common sense, spatial awareness or story direction.
>>
>mistral small
do you guys think it's 15B or something
>>
>>101563080
Mistral Large 2 is 123B. CR+ is 105B.
>>
>>101560232
nigga just distill the logits to 70b and 34b sizes respectively. there has been a trend amongst the AI labs to distill the gains from the big models either via logits or outputs.
>>
>>101560813
I think it makes more sense to look at Mean Δp and RMS Δp but LLaMA 3.1 does better than LLaMA 3 in those as well.

>>101560901
What I would assume is happening is that for the 3.1 training they changed the hyperparameters in such a way that the numerical values in the weights/activations end up being more even.
llama.cpp quantization essentially uses the same exponent for 16/32/256 values (instead of 1 exponent for 1 value with floats) so the numerical precision is better if the values all have roughly equal absolute values.

Do make sure though that you are using the same code for these calculations.
The default matrix multiplication method in llama.cpp was recently changed from cuBLAS to MMQ and the results will be slightly different between the two.
>>
>>101562512
my guess is that besides a few small teams and large organizations, no one really has the proper setup for the training pipeline and the capital. sure if you're bankrolled by some rich guy you can do it but you'll need to find how to make money off that too - and anime & hentai media have retarded copyright laws depending on the region, so who'd want to risk getting sued by some jap company or worse, Sony?
>>
>>101560957
Noted but I don't think this will end up being useful for my purposes.
>>
3.1 ggufs might be back thanks to ollama guy
>Nice. Tested doing a bunch of summaries using up the entire 128k context and the output looks good whereas on master it outputs broken garbage.
https://github.com/ggerganov/llama.cpp/pull/8676/#issuecomment-2249653389
>>
Why no llama between 8 and 70b?
>>
prompt processing is the bane of my existence on these large-context models
may as well just dump all 6000 tokens of lorebook into the context from the start rather than trying to use them normally
>>
>>101563229
Because fuck you. There are only two use cases Meta cares about: evaluation and corporate usage.
>>
>>101563274
Yeah...

llama_print_timings: load time = 95.89 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
llama_print_timings: prompt eval time = 108037.51 ms / 65533 tokens ( 1.65 ms per token, 606.58 tokens per second)
llama_print_timings: eval time = 3127.70 ms / 59 runs ( 53.01 ms per token, 18.86 tokens per second)
llama_print_timings: total time = 145147.80 ms / 65592 tokens

CtxLimit:131072/131072, Amt:60/60, Process:107.12s (1.6ms/T = 611.79T/s), Generate:4.57s (76.1ms/T = 13.14T/s), Total:111.68s (0.54T/s)
>>
>>101560260
Powerinfer2 works partly thanks to MoE. There are probably even better models for local, but mixtral is best we have for now.

Dense models only make some sense for batch processing, not local. When you are working on 128+ conversations at a time, one of them will always need one of the parameters so using/predicting activation sparsity has limited use (will use a fraction of the FLOPs, but still needs the same memory bandwidth).

The future of local is predicted sparse activation.
>>
>>101563289
It also slows down the higher it gets. Pretty speedy up to 64k then it crawls the last few k to 131k.
>>
Think they plan on releasing a mistral medium 70b?
>>
>>101563397
You mean a tune of Llama 3 70B like they made of Llama 2 65B? I assume not.
>>
File: 1717392122186284.jpg (153 KB, 1280x1039)
153 KB
153 KB JPG
openai won.
>>
>>101563440
They did a tune of llama2-70b. There was no llama2-65b only a llama1-65b.
>>
so 405b > mistral large for coding for sure. even had instances where 405b came up with something neater than sonnet 3.5
personality/rp wise, I prefer mistral large, 405b is just a tad more dull but it seems to depend quite a bit on the persona you are going for. very SFW stuff it does quite well
and mistrals pricing is a bit delusional given 405b is 1/3rd as expensive on openrouter
so 405b is quite a good release in my books, and either way we eating good this week
>>
post your blackpill and whitepill
blackpill: there would be such a gap in performance and ability between >400b models and small distilled models that companies using the big models will be always ahead of the competition and big AI labs will gatekeep the API access for these big ones
whitepill: there's a huge amount of AI research (most of which are still underdiscussed) that a few breakthroughs here and there from research teams or anons will make the small models capable enough that small-medium companies can just tune it on their own use cases and be good enough for 90% of the time
graypill: most of the code and text data that will be written in 5-7 years will be from AIs and we're about to witness the equivalent of 1980s-2010s chinese product quality but for software and 200x worse
>>
>>101563535
As a whitepill I'd add that we're probably on the cusp of widespread adoption of simultaneuos translation within a few years. That's a very big deal, especially in countries where the vast majority of the population don't speak english (or not very well). Communication is going to get a lot more efficient.
>>
File: file.png (56 KB, 979x512)
56 KB
56 KB PNG
Does Mistral Large not work in Koboldcpp yet, or is my download broken?
The changelog for the latest version says that Nemo is supported now, but Large might be different enough again to not work?
>>
>>101563784
>he downloaded from 'rancher
>>
>>101563784
just to make sure, you do have the two parts, part1of2 and part2of2 in the same folder, yes?
>>
File: file.png (53 KB, 828x460)
53 KB
53 KB PNG
What are good nemo presets?
It's a bit too sovlful for me atm
>>
File: file.png (9 KB, 816x47)
9 KB
9 KB PNG
>>101563818
Yes, and the file sizes look plausible
>>
>>101563832
you need to concatenate with copy /b or something
the retard you downloaded from doesn't split them with gguf-split but manually for whatever reason
>>
>>101563842
that worked, thanks
>>
mistral large same prompt format as 8x22 right?
>>
>>101563535
Open models are all made by cowards using the most traditional known to work transformers and training methods.

GPT2+swiglu+ROPE+minor-tweaks => every fucking open model except Mixtral. Open models are at a standstill because they are all sackless.
>>
>>101563821
For basic blah blah I just run neutralized samplers + 0.1 min p. Smooth sampling optional.
>>
So how much does it cost to make a LoRA for 405b?
>>
>>101564020
Hard to say. I don't think there's even a remote workstation big enough to rent and fit the entire thing in memory.
Totally unfeasible for average person, even with a lot of spare cash.
>>
>>101564053
I have a small company.

Could I do it for 50k?
>>
I realized that the context template that was posted here for mistral nemo makes it go schizo
I switched back to the old mistral one
>>
>>101564059
for 50k it's doable. No idea why you would want to do it tho
>>
Pleb weights for mistral large dropping
https://huggingface.co/MaziyarPanahi/Mistral-Large-Instruct-2407-GGUF/tree/main
>>
File: iq1s.png (47 KB, 623x279)
47 KB
47 KB PNG
>>101564120
Seeing picrel, I notice that not even IQ1_S (the smallest quantization) is fully ternary. How did llama.cpp developers determine what quantization levels to use for the various layers?
>>
What is it about this whole topic of opensource? Big tech gives us the scraps from the table and we worship them in the hope that another crumb will soon fall from the table - because of the thing itself, nothing will change - do I have to redefine open source for myself as a kind of saliva licker?
But le ecosystem? :>
>>
>>101564172

It's "open" if you have money. I'd rather pay the Nvidia tax than have Altman and his cronies have complete, unfettered access to the tech and bar us plebs from ever being able to ERP in peace.
>>
>>101562512
>why does no one invests into making porn more available with no benefit
gee i wonder
>>
>>101562518
The next pony is going to be so cucked you won't believe it.
>>
>>101562533
>pony had to nuke all the artist tags
The pony author certainly didn't "have" to, but did anyway. And the next pony model is almost certainly also going to be pruned of cunny-adjacent stuff as well, among other things.
>>
I have an extra $1000 unexpectedly, can I get a lot of vram for this money? Not buying a 3090, I either get 48gb+ vram or nothing
>>
>>101563535
>graypill: most of the code and text data that will be written in 5-7 years will be from AIs and we're about to witness the equivalent of 1980s-2010s chinese product quality but for software and 200x worse
thats the most pathetic cope from codeshitters yet
>>
>>101564120
Is IQ2_K usable? I really don't want to offload.
>>
>>101564169
I think I. Kawrakow did it by just manually testing which layers are most sensitive to a loss in precision and assuming that the combinations that work well for one model will generalize to all models.
>>
>>101564734
Looks promising on first impression for its size class.
>>
None of the mistral ST presets work for me. I gave up and changed to Alpaca and shit just werked. Yes I have immense skill issue
>>
>>101560019
is this done with an llm? if so, very good.
>>
>>101563449
When was lmsys last relevant? Even a 7b can answer 99% of the questions normies ask, and it just turns into a readability/presentation benchmark
>>
llama 3.1 mmproj when?
>>
>downloaded Mini-Magnum
>can't load the GGUF in ooba
Am I retarded or is it not supported yet? I've been out of the loop.
>>
someone asked about a bitesize model that could be packaged with a game the other day
what did they/you decide on
>>
>>101564680
You could buy a pair of 24GB P40 graphics cards for under $1k.
>>
>>101564734
In my opinion < 4 BPW is not worth the drop in quality.
>>
anything better than mistral for vramlets yet?
>>
>>101564943
>mistral
MIXtral*
>>
>>101564943
No, nothing, none whatsoever.
Mistral is great, amazing, spectacular.
>>
Is the PygmalionAI team fine tuning the LLAMA 3.1?
>>
>>101564943
Gemma 27b q4_k_m
>>
Large2 is slop
>>
>>101564969
Why would you care? Finetuning smart and engaging models has gotten too complex for two and a half people in their basement to create something worthwhile as of July 2024. That also applies to most other would-be LLM finetuners, btw.

Keep enjoying your Nemo/Gemma/Mistral Large/etc.
>>
>>101565037
>t. skillet
>>
>>101564996
Gemma 2 27B Q6_K with output and embed tensors quantized to Q8_0 can be fully loaded in 24GB. Now it just needs FlashAttention2 support to work for 8k tokens context...
>>
>>101563535
>most of the code and text data that will be written in 5-7 years will be from AIs and we're about to witness the equivalent of 1980s-2010s chinese product quality but for software and 200x worse
that doesnt even make sense.
where is software good?
OS all suck now.
Even the internet sucks. I swear I had better loading times with my 56k modem.
The pics were slow but the page was there instantly.
Everything seems to get worse for a while now actually.
>>
mini-magnum is pretty good for RP compared to the previous slop people dished out. And I was using L3-Euryale-2.1 before this (sold by 2nd 3090)
>>
>>101565056
That their latest experimental model (Magnum-72B) is based on Qwen2-***Instruct*** demonstrates exactly that. Finetuning base models into something usable and smart has become too cumbersome/expensive/risky. You can't just use chat logs anymore like in early 2023.
>>
File: Nero.jpg (18 KB, 1047x86)
18 KB
18 KB JPG
>testing Nemo GGUF
>doesn't seem to be as good as people say it is
>ask it if it's really Nemo
huh?
>>
File: yakub.jpg (7 KB, 196x257)
7 KB
7 KB JPG
>>101565105
>neroid empowered rational operator
>>
>>101565105
What the fuck?
>>
>>101565105
This looks like a typical jailbreak misfire, check your prompt retard
>>
>>101565105
jej
>>
>>101565105
[INST] and [/INST] are special tokens in Nemo, you don't need to add spaces around them.
>>
File: lecun_dontworkonllms.png (462 KB, 580x895)
462 KB
462 KB PNG
>>101565037
>t. LeCun
>>
>>101565159
specifically wiped context memory for this little test
i had a first prompt with just "who are you" to which it answered "I am a large language model developed by Mistral in cooperation with NVIDIA"
but i wanted it to say Nemo, hence the second question
>>101565178
ok thanks
>>
>>101563087
They do slip in sloppy output at times, but the thing I like about it is its variety. Unlike previous untuned mistrals where it's all dry and clinical no matter what you do, swiping does get you somewhere else. I'll keep testing it and see.
>>
>>101563915
With a space after [INST] and [/INST], but not before the later.
<s>[INST] system message

user message[/INST] assistant response</s>
>>
>>101565244
I think he is 100% correct. I will reconsider when I get even one model that never mentions shivers when it sucks my cock. But I don't think that is gonna happen.
>>
What are the political implications of perfectly predicted next token != actual fact / truth / solution to a problem.
>>
>>101565504
What I find is that the larger parameter models have greater ability for simulacrum. Meaning they can deal with 'worlds within worlds' better without blurring the boundaries or having things leak between inner and outer realities.

With a 100B+ model, you should definitely be able to tell it "You are an AI that never says shivers, ministrations, bonds" with no detrimental effect on output quality
>>
>>101563535
Whitepill: LLMs will become smart enough in the future to decompile proprietary software and understand what even the most illiterate pajeet wants from the software.
Blackpill: "You VILL eat ze bugz own nottin and be happi" people will push their regulations and surveillance capitalism even harder.
Graypill: A lot of useless jobs will be eliminated.
>>
Boys I have a 6000 Ada and I want to run mistral large 2. Should I buy another Ada or an 80GB A100 or something?
>>
>>101563915
It likes alpaca
>>
>>101565504
Anon, there's only so many ways to describe said cock sucking. English is a limited language, eventually the context AROUND the scene/the story created up until then is going to be the defining factor, rather than the language used. Which sucks, because LLMs fucking suck at longterm buildup and payoff.
>>
>>101565105
Models don't "know" details about themselves unless it's put into the training data. If they don't know they'll make it up.
>>
>>101565591
>Which sucks, because LLMs fucking suck at longterm buildup and payoff.
This is such a problem for me, I don't know how people do 500+ message chats with models that can't create and resolve a plot thread to save their lives. What could be done to fix it in training?
>>
>>101563535
>5-7 years
the world will not exist in 5 to 7 years
>>
>>101565537
>ask model about very specific and uncontroversial scientific topic
>get smart answer because mostly smart people talk about it
>think the model is smart
>ask about politically charged topic
>get retarded answer because terminally online retards talk about it
>think this is a smart answer
There are only political implications if you confuse a language model with a real person.
The main problem is that Silicon Valley grifters like to pretend that upscaling transformers is the path to AGI in order to get VC money.
>>
>>101565603
>the world will not exist in 5 to 7 years
Yeah... I've heard about that one for 2012, still waiting...
>>
>>101565591
>there's only so many ways to describe said cock sucking
That is my impression too. But I also have a dream that one day I will start a cock sucking session. Then tell her OOC that I don't like shivers down the spine. And gleams in the eyes. And you get what I mean right? All those phrases. Don't use those. And she will say OK, easily classify them in her "brain" and stop using them from that point on. I don't see this ever happening for any current LLM that predicts the next token.
>>
>>101565582
Ricky, listen to me. Don't be a dumbass. We're going to rob that datacenter (sips rum and coke) and you'll have all the A100s you want, OK? I've got it all worked out. Things are gonna change here, Ricky. No more living in a car.
>>
>>101565096
>L3-Euryale-2.1
You must be extremely retarded.
>>
>>101565632
Definitely. I would absolutely love to have my own fetish's various slops eliminated, it's 1000% true that mixing up the language even a bit can make things feel way fresher. Thankfully, even though the human brain is really good at recognizing patterns, it's also really easy to trick into thinking things are novel.
>>
>>101565622
Political implications was just a memeable figure of speech. But yes I meant roughly something like that - the next most likely token can be absolutely retarded and this is what all the models are learning.
>>
If you used llama 3.1 on ollama, the new ollama 3.0 update patched the models, so you should redownload them.
>>
>>101565641
Ok drummer.
>>
>>101565694
go back
>>
https://poal.me/np0lsk
>>
>>101565721
how do i vote for go back?
>>
I should have broken Undi down into Undi(fr fr) and Undi(I am memeing)
>>
>>101565770
nah, unid wonned stay mad soa
>>
Can someone post PPL/KLD/MMLU benchmark comparisons for different mistral large quants? Trying to figure out if quanting it all the way down to IQ3XXS or even IQ2 is worth it
>>
>>101565721
3 serious votes and 12 meme votes. And people here think r/localllama is bad while /lmg/ is the actual place to discuss AI topics.
>>
Ok so what kind of rig will i need to run a 405B model?
>>
>>101565867
the userbase of r/localllama and /lmg/ is a circle. we just come here to shitpost because moderation is nonexistant here compared to reddit
>>
File: angry miku hatsune.jpg (73 KB, 736x973)
73 KB
73 KB JPG
Will ooba ever fix the mistral-nemo loader issie?
>>
>>101565891
you're not supposed to say it tho
>>
>>101565887
at least one raspberry pi
>>
>>101565721
where is "none of the above"?
>>
Does anyone know what rope scaling actually does? I'm loading gemma-27b-it to 8k context with llama.cpp but am paranoid that it's making it retarded, I can't notice any difference but am constantly switching back and forth because I don't trust it...
>>
>>101565887
I can't imagine any commercial offering using such a large model. It takes a full H100 SXM5 server to run, and it's not like it's going to serve multiple prompts per second, so how do you monetize such a thing? It's just burning cash in a dick waving contest.
>>
>>101566009
It's retarded.
https://desuarchive.org/g/thread/101287708/#101295729
https://desuarchive.org/g/thread/101392789/#101398280
I noticed it with formatting, like with a markdown code block. At 8k it gets it right, but scaled it starts to make mistakes.
>>
File: file.png (139 KB, 1368x772)
139 KB
139 KB PNG
>>101566053
>>
>>101566083
anecdotal, but fireworks is doing some shady shit. llama3 70b was underperforming on benchmarks last I tested it through their api
>>
>>101566056
>At 8k it gets it right, but scaled it starts to make mistakes
But its supported context is 4k. Are you saying it's fine scaled 2x to 8k?
>>
>>101566083
Is that per token?
>>
>>101566109
yes, $3 per token in, if you send a large document you have to pay a year's salary
>>
>>101566103
>But its supported context is 4k
no, it's 8k
>"max_position_embeddings": 8192,
https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3/blob/main/config.json#L19
>>
>>101566129
still probably worth it. 405b has to be smarter than the average salaryman
>>
File: 1721103827875.png (76 KB, 1850x175)
76 KB
76 KB PNG
>>101566103
Yeah, it works at 8k without scaling.
>>
>>101566135
>>101566141
Damn I guess I'm just retarded then, I was switching back and forth between 4k and 8k trying to notice the difference and thought I was going insane, thanks
>>
All right, prices are beginning to descend from the delusional stratosphere on Volta. Check it: https://www.ebay.com/itm/335490833886
$1600 isn't bad if they're V100 32GB. If they're just 16GB, forget it, just continue to stack P100s.
>>
>>101566245
And that's the complete server? Just plug that whore in and you're good? I can't believe that's less than $2k. Holy shit.
Of course the caveat is obvious, loud, and its a power hog. But how bad?
>>
put the fucking base 405B on the openrouter or else, basterd
>>
>>101566245
>w/ 4x TESLA P100 SXM2
>if they're V100 32GB
...
>>
what's the writing style called where it's like:
>She walked into my office smelling of trouble, a dame on dagger heels.
like casablancian or something
>>
Who will become rich from founding a video generation company?
>>
File: 0_tADWew2sYSAk1pvM.jpg (20 KB, 640x480)
20 KB
20 KB JPG
Which is better, Meta's Llama 3.1, or Google's Gemma 2?

I am just downloading Gemma 2 to run it on my machine. I just tried it online and it seemed to answer some questions better than Llama 3.1 (just for my particular questions - I'm sure Llama 3.1 is probably better for other types of questions)
>>
>>101566390
hardboiled / detective noir
>>
>>101566401
>I'm sure Llama 3.1 is probably better for other types of questions)
it's not. Gemma 2 mogs llama 3.1 in every way
>>
>>101566401
Gemma 2 spat out complete schizo shit in every language known to man when I ran it with llamacpp, let me know which settings you're using once you get it to work
>>
>>101566401
Gemma 2 is way better

Current tierlist is:
L3.1 70B
--small gap--
Gemma 2 27B
--big gap--
Nemo 12B
--tiny gap--
Gemma 9B = L3.1 8B
>>
>>101566401
In the <10B weight class robably around the same when it comes to being a general assistant. Gemma can be better for ERP. The issue with Gemma is the low context, which can limit your use cases.
>>
>>101566426
I have been using exclusively command-r-plus q4_k_m for months and suffering 0.6 tokens/s on cpu, gemma 2 27b is the only model I've tried so far (including 70b models) that is as good at reasoning as cr+, so I find it hard to believe l3.1 70b is better
>>
>>101566426
Is Gemma better than largestral
>>
>>101566264
1U is going to be very loud. I had an R720 with a pair of P100s, and that's a 2U, and it was loud since it needed a 60% fan offset to maintain the needed idle airflow.
There's nothing wrong with 1U, just keep it in a separate room behind a closed door.
>>
Do you guys prefer gemma 2 it or base?
>>
>>101566382
Well yeah of course it conveniently leaves that out so unless you get it in writing, assume it's 16GB V100s. Or maybe you can be fuck them over on a NAD claim and get it for free.
>>
>>101566464
not even close
>>
>>101566496
are you genuinely retarded?
>>
>>101566401
Are you using this as some kind of evaluation or are there questions you ask a LLM because you really want an answer?
>>
>>101566521
>actually trying to be useful by determing use case
sir, we don't do that here, you just shill what you prefer.
>>
>>101566486
I can't find any gguf quants of the base model with llamacpp fixes so there's no way to know.
>>
>>101566486
No one uses or tests base models these days.
>>
>>101566508
They make V100 SXM2 in 16GB and 32GB variants, so no, go fuck yourself.
>>
Huggingface is driving me nuts with uploading my quant. I have git lfs enabled on the repo, I have the file broken up by llama-gguf-split, and still, I get the following error when pushing:
error: RPC failed; HTTP 413 curl 22 The requested URL returned error: 413
send-pack: unexpected disconnect while reading sideband packet
Writing objects: 100% (13/13), 28.39 GiB | 84.82 MiB/s, done.
Total 13 (delta 2), reused 1 (delta 0), pack-reused 0
fatal: the remote end hung up unexpectedly
Everything up-to-date

Ideas?
>>
>>101566563
It's because base models are usually incoherent and quick to lose track of what's happening at the normal sizes especially below 70b when you try to use them for chat. But gemma is SOTA and sort of on par with 100b+ models in terms of coherence, so I wouldn't be surprised if the base model can just be plugged into sillytavern and work perfectly.
>>
>>101566401
What's your use case? If it's RP and smut L3 absolutely sucks at it because they filtered out NSFW from training data. If it's coding then ChatGPT is free and also I wouldn't use anything less than SOTA for coding
>>
>>101566643
>But gemma is SOTA and sort of on par with 100b+ models
wow shills in overdrive today, you gonna claim it's better than largestral i guess?
>>
>>101566638
>error: 413
Maybe make the files smaller or upload them independently? I've never uploaded to hf but don't think receiving the error code meaning "content too large" is a bug, it's pretty clear what that means
>>
>>101566638
I think you should use their own lib to upload to their repos in that case.
>>
>>101566657
Largestral literally just came out so I don't know. What I meant was that it's about as good as CR+
>>
>>101566668
the limit is supposed to be 50gb afaik, so his 29gb should be perfectly fine
>>
>>101566563
One big problem is that base models are too fucking schizo and loopy at the same time. Why would looping happen with base models even at high temperature (~1) anyway? I don't get that, wouldn't their wider token selection prevent it?
>>
>>101566680
and that's absurd, try harder please if you said 70 it could at least perhaps be argued.
>>
>>101566703
>70b
lmao nice joke
Name 1 single 70b model that's even vageuly usable (except miqu)
>>
Are quants made with imatrix actually good? Should I be making all my quants using imatrix?
>>
>>101566670
I'll pull them down to my desktop and try shoving them up from there in their web interface.
>>
>>101566739
no
>>
>>101566643
gemma is good but not that good that it jumps a size class. maybe if you have to choose between low-quant 70b and q8 gemma then yeah, but I run high quants of both and gemma is noticeably dumber, loses nuance and makes more mistakes
I would love to use a smaller, faster model if it gave the same or better results but it simply doesn't
>>
Babe wake up, a 1T parameter model just got released
https://huggingface.co/CofeAI/Tele-FLM-1T
>>
>>101566799
>create shitty 1T model
>claim it's good
>nobody can tell because nobody can run it
>>
>>101566813
low-key way of storing personal backups on huggingface servers
>>
>>101566799
> it has been trained on approximately 2T tokens
>"max_position_embeddings": 4096,
useless
>>
>>101566813
>>
>>101566830
i'd rather they use the compute on making their 52B better (10+T train, 32k+ ctx) sad to see really
>>
>>101566755
Can you elaborate? Does it make output more sloppy or something?
>>
>>101566053
sure, but quantized...
>>
>>101566830
>>101566799
>we've made extensive efforts to thoroughly clean and filter the training corpus for the model
>>
>>101566885
pedos are going to ruin videogen for us before we get a local model
>>
>>101566799
What rig do I need to run this at Q4?
>>
>>101566830
You don't need more than 2k context.
>>
>>101566813
Can I run it on my gtx 950?
>>
>>101566908
but how am i gonna have it write me an entire codebase overnight with just 512 ctx?
>>
>>101566927
Summarization, using 256.
>>
>>101566893
at least 20 3090s for 4bpw 40+ for 8bpw
>>
>>101566942
you drive a hard bargain, those 64 tokens are really tempting...
>>
>>101566989
precocious 14 year old teen girl on the beach
>>
File: 11__00898_.png (1.21 MB, 1024x1024)
1.21 MB
1.21 MB PNG
>>101566739
They enable quantization in smaller sizes like 1 or 2 bits (IQ quants). It requires a separate process to quant from traditional (static) ggufs but for the end-user the usage is the same.
If you can't offload them entirely to GPU your performance is going to take a hit. But importance matrix (imatrix) is what allows these quants to be smaller.
>>101566638
Sounds like a sign from above anon.
>>
>>101564560
Had to in order to protect his ass legally for the future. He said he did so after a discussion with a lawyer. It is a grey area after all.
>>
>>101567028
I mean at quants like Q4. Will using imatrix give better output?
>>
>>101567050
>discord troon trying to associate miku with pedophiles
Jart lost, troons lost, ywnbaw neck yourself and leave
>>
How does Command R compare to Mixtral 8x7B merges like BagelMIsteryTour v2 or Nous Hermes 2 Mixtruct?
>>
>>101567094
It's similar to the difference back in the day between llama1 30b and 70b
>>
>>101567094
all three mogged hard by dolphin 2.5 literally local gpt4 (both good and bad)
>>
>>101567094
Why not use Nemo?
>>
>>101566425
I just ran Gemma 2 on my laptop (9 billion parameters) and it runs fine, but it's a bit slow, and it makes the laptop sound like a jet engine every time I submit a prompt, because the CPU suddenly spikes to >90C

Indeed I don't have a GPU on this laptop. Well there's an iGPU but it doesn't seem to be using that.
>>
are we still using mixtral for erp? or has anything changed?
>>
>>101567147
>Well there's an iGPU but it doesn't seem to be using that.
perhaps try to use llama.cpp's vulkan backend? might work?
>>
>>101566521
I'm looking at them because I have an idea for a website where an LLM could help with certain things. I'm wondering if I could run a model on a VPS. I think I could, but the LLM responses would take a couple of seconds if the VPS doesn't have access to a GPU. That might be okay though. Renting GPU time means more money of course.
>>
>>101567147
iGPUs are barely usable for this anyways due to how the memory split works
>>
I watch the flickering outputs of the model giving refusals and the code early-stopping and switching seeds/bumping temp and re-generating over and over, sometimes 10 times or so. Time wise it's all over in a second and the model falls in line, giving the requested answer.
The censoring attempts are futile. A mere inconvenience that adds a fraction of a second of overhead to an otherwise functional and efficient framework.
>>
>>101567182
So, on the recent Intel stuff, (Iris Xe) it seems to be able to access the full system memory. However, in my testing on my lil N305 system, iGPU is not faster than CPU. I guess the only advantage is you have no load on your CPU cores, but they're going to contend for the shared memory bandwidth, so unless they improve things, iGPU is useless.
>>
I thinking of taking some small 2B model, expand it to 1T parameters, but only the 2B parameters would ever get accessed by the network. The rest of it is filled with encrypted personal data.
>>
>>101567121
Given comparable training quality I'd rather fill more of my GPU with model weights and less with context.
>>
>>101567223
>>101567223
>>101567223
>>
>>101567184
are we... the badies?
>>
>>101567184
Teenagers are "children" only in the retarded angloamerican mind.
>>
File: Capture.jpg (36 KB, 725x771)
36 KB
36 KB JPG
Poll will close any minute now. Sao is in the lead.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.