[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: GaS2PwOXYAA0hjN.jpg (79 KB, 672x756)
79 KB
79 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103057367 & >>103045507

►News
>(10/31) QTIP: Quantization with Trellises and Incoherence Processing: https://github.com/Cornell-RelaxML/qtip
>(10/31) Fish Agent V0.1 3B: Voice-to-Voice and TTS model: https://hf.co/fishaudio/fish-agent-v0.1-3b
>(10/31) Transluce open-sources AI investigation toolkit: https://github.com/TransluceAI/observatory
>(10/30) TokenFormer models with fully attention-based architecture: https://hf.co/Haiyang-W/TokenFormer-1-5B
>(10/30) MaskGCT: Zero-Shot TTS with Masked Generative Codec Transformer: https://hf.co/amphion/MaskGCT

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: 023a3def6f9.jpg (465 KB, 1024x1024)
465 KB
465 KB JPG
--- A Measure of the Current Meta ---
> a suggestion of what to try from (You)

96GB VRAM
Qwen/Qwen2.5-72B-Instruct-Q8_0.gguf (aka the best of the best)
anthracite-org/magnum-v4-72b-gguf-Q8_0.gguf

64GB VRAM
Qwen/Qwen2.5-72B-Instruct-Q5_K_M.gguf
anthracite-org/magnum-v4-72b-gguf-Q5_K_M.gguf

48GB VRAM
Qwen/Qwen2.5-72B-Instruct-IQ4_XS.gguf
anthracite-org/magnum-v4-72b-gguf-IQ4_XS.gguf

24GB VRAM
Qwen/Qwen2.5-32B-Instruct-Q4_K_M.gguf
EVA-UNIT-01/EVA-Qwen2.5-32B-v0.1-Q4_K_M.gguf

16GB VRAM
Qwen/Qwen2.5-14B-Instruct-Q6_K.gguf
EVA-UNIT-01/EVA-Qwen2.5-14B-v0.1-Q6_K.gguf

12GB VRAM
Qwen/Qwen2.5-14B-Instruct-Q4_K_M.gguf
EVA-UNIT-01/EVA-Qwen2.5-14B-v0.1-Q4_K_M.gguf

8GB VRAM
mistralai/Mistral-Nemo-Instruct-2407-IQ4_XS.gguf
anthracite-org/magnum-v4-12b-IQ4_XS.gguf
TheDrummer/Rocinante-12B-v1.1-IQ4_XS.gguf

Potato
>>>/g/aicg

> fite me
>>
the day feels cactussy
>>
>>103066797
Qwen2.5-14B sucks. also, who the fuck has anywhere near that much vram? gtfo
>>
>>103066797
>wasting half your list on models nobody can run
kill yoruself
>>
>>103066856
>who the fuck has anywhere near that much vram
16GB? Really?
>>
>>103066839
*hinussy
>>
>>103066878
hi petra
>>
File: 1719628328816342.webm (2.11 MB, 1024x1024)
2.11 MB
2.11 MB WEBM
After some delays, I have finally reached 50 questions for the culture benchmark. Yay yay. 50 more to go.
>>
File: miku-fridge.jpg (161 KB, 1024x1024)
161 KB
161 KB JPG
►Recent Highlights from the Previous Thread: >>103057367

--Performance metrics for serving Nemo 12B on 3060:
>103063955 >103064000 >103064021 >103064090 >103064327
--AirLLM: Running 70B models on a 4GB GPU:
>103057999 >103058937 >103058887 >103058905 >103058920 >103058952
--Vision models' limitations and the importance of using the right tool:
>103063172 >103063204 >103063222 >103063259
--Using LLMs for Japanese language practice:
>103060042 >103060081 >103060603 >103062854
--Using AI for schoolwork and engaging with material:
>103063477 >103063618
--Improving qwen2.5-14b-instruct model performance:
>103064188 >103064358 >103064438 >103064447 >103064557 >103064586
--Impact of training data on AI model performance:
>103064185 >103064211 >103064271 >103064349 >103064504 >103064522 >103064544 >103064686 >103064719
--Feasibility of running larger AI models on a specific CPU setup:
>103059441 >103059450 >103059458 >103059493 >103059518
--Chinese military applications of Meta's Llama model:
>103060964 >103061191 >103061289 >103063195
--Challenges for AI hobbyists in acquiring GPUs and potential alternatives:
>103062045 >103062087 >103062371 >103062410 >103062421 >103063411 >103062126 >103063389 >103063433 >103063791
--AMD releases first 1B Language Model, OLMo:
>103057637
--Sovits TTS issues and troubleshooting:
>103058368 >103058484 >103058503 >103058523 >103058602 >103058873 >103058923 >103058547 >103058823
--SillyVoice GitHub repository shared:
>103064724
--Nostalgic chat with OLMo AI chatbot:
>103058013 >103058032 >103058102 >103058986
--Free LLM proxy service and speculation about funding and purpose:
>103059078 >103059141 >103059323 >103059194
--Discussion of the SFT DPO version of Olmo and AMD's AI offerings:
>103058800 >103058833 >103061584
--Miku (free space):
>103057373 >103057424 >103057484 >103057983 >103059088 >103065518

►Recent Highlight Posts from the Previous Thread: >>103057368

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
>>103066899
>no reply news gets added
>mine doesnt
KYS
>>
Miku stew!
>>
>>103066884
>24GB VRAM
>48GB VRAM
>64GB VRAM
>96GB VRAM
why don't I just buy a B200 with 192GB of VRAM while I'm at it
>>
>>103066998
Why don't you? You're not poor are you?
>>
>>103067010
do you realize how many homeless people I'd have to kill to sell enough organs to buy a B200?
>>
>>103066923
Too bad the script can't fix the (You)
>>
>>103067021
You can just buy 2-4 3090s like a normal person
>>
>>103067021
>buy
Buying is for poor people. Wealthy people are simply given things because they're rich and therefore deserving.
>>
>>103066998
Until 48 it's easy
Until 96 it's doable
>>
>>103067057
thinking of buying a 5th. Talk me out of it bros.
>>
>>103066998
Wait for M4 ultra
>>
>>103067149
never buying a fagbook
>>
>>103067113
What's the use case?
>>
Where are the deepseek v2.5 finetunes?
>>
>>103067158
that's a legitimate question desu.
The anthrafags should try to do one.
>>
>>103067113
how would you use it?
>>
>405B Qtip
>109e9 bytes
Why did they skip from 70 to 40fucking5? Why not 20fucking5B?
I suffer.
>>
>>103067155
Then wait for some company to hopefully release some inference software for below 10K then.
>>
>>103066925
I didn't run the bot since previous thread hit page 5, but even so I just ran it again and the bot thought it was offtopic.
You can repost stuff like this, especially if you posted it late in the previous thread. For anyone else that missed it:
https://www.pcworld.com/article/2504035/security-flaws-found-in-all-nvidia-geforce-gpus-update-drivers-asap.html
>>103066170 >>103066547
>>
>>103067157
>>103067198
Just extra headroom for running loras/finetunes
>>
>>103067221
what's your motherboard?
>>
>>103067158
Its not deepseek but sorcerer 8x22B is about mistral large level but at a good speed.
>>
>>103067237
But yea, I keep hoping someone will try a deepseek tune. Its prob the smartest model with weights release. But god damn its dry, even with high temp.
>>
>>103066797
where's mythomax on this list? its like the best model out there..
>>
File: livebench-2024-09-30.png (932 KB, 3294x1894)
932 KB
932 KB PNG
>>103067237
>8x22B
>worse than Gemma 27B
Nah
>>
>>103067213
100B got canned at the last minute
>>
>>103067259
Yea, not true at all.
>>
>>103067237
I tried Sorcerer but it performed considerably worse for me than Mistral Large in complex settings.
>>
>>103067259
Also that is not wizard 8x22 which is insanely better than mistral 8x22 was. Like no one knows how they did it better still. And then they got wiped from existence.
>>
>>103067259
>command-r-plus-0824
why did it have to end up like this?
>>
File: 0_mWiyg21DYxJzDHRG.png (476 KB, 1080x1011)
476 KB
476 KB PNG
>>103067270
>>103067289
8x22B is a Reddit meme. Remember that it was released in a rush between Command R+ and Llama 3.
>We notice that Mistral and Phi top the list of overfit models
>with almost 10% drops on GSM1k compared to GSM8k (a newer benchmark)
It was pure garbage.
>>
>>103067274
Did you try vicuna formatting? Regular mistral formatting I noticed had issues:
https://huggingface.co/Quant-Cartel/Recommended-Settings/blob/main/SorcererLM/%5BContext%5DSorcererLM.json
>>
>>103067326
Overfitting is a good thing if your not retarded. Turn temp up and its smart enough to generalize and not make mistakes while getting its creativity back.
>>
>>103067345
Are you retarded? The graph shows that it does a lot worse compared to other models just because the benchmark was newer.
>>
>>103067356
Look up grokking
>>
>>103067374
Sounds dirty.
>>
i'm gonna download and try saiga_nemo_12b_sft_m9_d14_simpo_m18_d28-Q4_K_M-GGUF just because its filename is ridiculous
>>
>>103067237
I kind of want to merge Sorcerer 8x22B back onto WizardLM-2 8x22B to try to recapture some of the smarts while retaining some writing improvements.
>>
>>103066797
I am glad this is starting to become a post that is posted at the start of the thread. Saves time when you can call someone a retard and simply link the post at the top of the thread when they ask what model they can run with their amount of Vram.
>>
>>103067445
wow, it's actually really good, whatever it is.
>>
>>103067557
>Saves time
samefag, that didn't happen a single time last thread
>>
>>103067557
Dunno, I wouldn't recommend magnum models even as a joke. That anon is pure evil.
>>
>>103067595
hi sao
>>
>>103067751
hi xi
>>
File: 1709836382027362.png (107 KB, 3386x232)
107 KB
107 KB PNG
>>103067772
Objectively speaking, Qwen2.5 is the best open model besides the 405B.
>>
>>103067797
source?
>>
>>103067801
https://livebench.ai/
https://github.com/LiveBench/LiveBench/blob/main/assets/livebench-2024-09-30.png
>>
>>103066797
Qwen? But that's not how you spell Nemotron?
>>
>>103067158
>deepseek v2.5
I don't know if current implementation in llama.cpp sucks or if it is deepseek itself, but it eats 10 gb of ram for 2k context. For comparison, same quant size largestral can have 32k in the same amount of memory. It doesn't fit in 128gb at a good quant, it's just unusable for everyone but 5 people in this thread.
>>
>>103067809
That's not goonbench. On goonbench Largestral clearly dominates.
>>
>>103067834
It doesn't. Magnum v4 72B does.
>>
>>103067834
Nah, Nemotron mogs Largestral.
>>
>>103067854
>>103067856
Damn guys, you're getting too mischievous.
>>
>>103067801
Anyone who has used it. Qwen2.5 is super smart but positive biased / censored. Uncensored qwen 2.5 is amazing with either alliterated or a finetune.
>>
>>103067882
>Uncensored qwen 2.5 is amazing with either alliterated
An assistant that can never refuse your requests is perfect.
An RP partner that physically can't say no is boring.
>>
>>103066797
>Qwen/Qwen2.5-14B-Instruct-Q4_K_M.gguf
I'm trying to run this for an automatic any-to-english translation system, but it feels like at a certain point in the conversation/if the text is too long, it'll translate into Chinese instead. Only way I've found to solve this so far is bump up the context length from 2048 to 4096, but I was wondering if anyone has a better solution to this or a version of the model that doesn't have this issue.
>>
>>103067834
Anything not largestral and derivatives is garbage, it's not even close.
>>
>>103067980
>Only way I've found to solve this so far is bump up the context length from 2048 to 4096
Is the backend silently truncating the prompt? Are you using greedy sampling?
>>
>>103067980
That's just qwen for you.
>>
>>103067980
i mean 2048 is a really short context length. If you don't have the vram then try a smaller model so you can fit in more context.
>>
>>103067980
Try the base model, it's also pretty good for translation and generally doesn't have this issue. but you need to use it as a completion model, not an instruct model.
>>
>>103067856
>>103067825
Settings for Nemo? I’m running at bpw and my outputs have all been trash/ super repetitive.
>>
>>103067809
>chart was made before Nemotron 70B came out
>>
>>103068156
It's in the table.
>>
File: 1715830787598652.png (336 KB, 3000x2100)
336 KB
336 KB PNG
I suppose this is a good time to repost this.

>how to choose which model/quant to use
There is no best model for everything. There are only models with strengths and weaknesses. These benchmarks are not perfect but generally are decent sources.
https://livebench.ai
https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard
https://huggingface.co/spaces/flowers-team/StickToYourRoleLeaderboard
https://aider.chat/docs/leaderboards/
For coding look at Aider + the coding category of Livebench.
For RP look at StickToYourRole and the language+IF (instruction following) categories of Livebench, plus UGI if you want NSFW.

Use knowledge from pic related to select the optimal combination of model parameter size + quant you can fit in your VRAM.

(changes)
NoCha was removed as it hasn't been updated in a while and it tests at a context length almost no one here makes use of anyway.
>>
what's with all the
>i want to be the one to help the newfags
posts recently?
>>
>>103068148
Mistral-NeMo or Llama-3.1-Nemotron-70B? For the latter I'm running with min-p 0.002 and a "Write the next message in the style of <XYZ>." system message at depth 0.
>>
>>103068148
DRY 0.8 + MinP 0.05 + Temp 0.5 is all you need.
>>
>>103068188
>>103068200
What about skip special tokens? Getting mixed messages whether to keep that checked or not.

This is for llama 3.1 yes
>>
>>103068229
I'd like to know this too honestly. And why the neutralize samplers button makes that box checked. Doesn't seem right.
>>
>>103068184
clueless newfriends helping newer newfriends
for example see the qwen-tastic post that's now riding the coattails of every new bread
always remember, recommendations without context or logs even are trash to be ignored. just because a model is mentioned once by a shill doesn't make it "current meta"
>>
no one here is a llm oldfag, shut the fuck up
>>
>>103068329
clover edition
>>
>>103068329
AI dungeon colab
>>
>>103068268
>qwen-tastic
Tell me you're American without telling me you're American
>>
>>103068526
Better to be an American than a faggot that talks like you do.
>>
>>103068169
thanks anon
>>
File: 1461321657885.png (99 KB, 329x313)
99 KB
99 KB PNG
>>103068329
>>
>>103068229
I have "skip special tokens enabled" and it's going fine for me.
>>
File: 1702683510088972.png (2 KB, 174x50)
2 KB
2 KB PNG
>>103066797
they forgor 128 GB largestral 2bros...

Also, qwen2.5 72B vs largestral 2 for RP? I doubt qwen is better but is it at least different enough in a good way?
>>
cactus
>>
>>103066923
delete the bookmarlet and just have
1. link to violentmonkey install or similar extensions
2. direct link to RAW user.js script hosted somewhere (github gists, greasyfork) which will make those extensions instantly be able to easily install them when it notices that they are viewing a raw userscript in the page

instead of requiring npcs to click a fucking bookmarklet every thread lmao
>>
>>103068695
Qwen is better, especially the fine-tunes. I tried Eva and Magnum and the later changes the style more, which is what I prefer. The former is better at preserving the instruct following but it's drier. But I only used them to write stories, not RP in SillyTavern.
>>
>>103068701
the forbidden dick
>>
>>103068759
spiked for her pleasure
>>
>>103067326
i used wiz 8x22 since it came out daily until largestral 2 which i now use daily, its local sota for creative writing and none of the other models come close, the only slight problem with wiz 8x22 is that it was a bit dry, largestral 2 fixed that while making it 10-15% smarter
>>
>>103068797
If you miss speed sorcerer fixed wizards dryness
>>
>>103068766
paige no
>>
>>103066797
Any list that has anything 'magnum' in it is garbage, I'm disregarding it.
>>
>>103068906
cry more
>>
>>103068906
cry less
>>
>>103068906
cry some
>>
File: 1723984876733920.png (6 KB, 1430x40)
6 KB
6 KB PNG
What in the fuck is this dataset?
>>
>>103068970
Chainsaw.
>>
>>103068797
based largestral respecter
qwenfags will NEVER win
>>
File: 1703377626696801.png (14 KB, 1567x50)
14 KB
14 KB PNG
Jesus I can't stop laughing. What the hell is this.
>>
>>103068997
looks like a kid having fun
>>
File: sex.png (35 KB, 2532x69)
35 KB
35 KB PNG
>>103068970
sex
>>
>>103069034
sovl
>>
File: jarvis.png (145 KB, 1364x593)
145 KB
145 KB PNG
jarvis
>https://github.com/ggerganov/llama.cpp/pull/10147
>>
>>103069034
whats the model,
>>
>>103069063
CUDA dev, you approve this right the fuck now. Do it for the lulz.
>>
>>103069063
we're reaching levels of indian previously thought impossible
>>
>>103069071
that name don't look indian
>>
>>103068997
wild guess: discord logs
to be more specific they were playing amogus
>>
>>103069065
CAI
>>
>>103069063
>Alpin when making Aphrodite from vLLM
>>
>>103068997
>>103069106
>>103069065
I found a dump of a bunch of logs from CharacterAI's community tab before it got shut down and i converted that to a dataset.

System prompt is a bit fucked but it's had promising results so far.
>>
File: thomas.png (9 KB, 556x112)
9 KB
9 KB PNG
>>103069082
Don't get fooled by the name
>>
>>103069239
Indian uncle mustache
>>
>llama.cpp has more pull requests than issues
dayum
>>
File: nojarvis.png (256 KB, 1364x1003)
256 KB
256 KB PNG
>>103069063
no jarvis
>>
>>103069128
at least that's a fork and not a PR to rename the original repo
>>
>>103069578
It was probably a mistake and he wanted to do it in one of his repos, unless there's more history to this Jarvis rename...
>>
is there a way i can chat and generate images of the scene at the same time? on 10gb vram?
>>
>>103070025
Yes but only by using SD1.5 as the image model, and a retarded small language model.
>>
>>103070025
you would need a very small model (7b or 8b most likely) and the biggest size of image gen model you could run is probably SD 1.5. But it will work.
https://github.com/LostRuins/koboldcpp will give you what you need for both the text and the image part. from there you can grab an image gen and a text gen model.
>>
File: 1845.jpg (19 KB, 1104x97)
19 KB
19 KB JPG
>>103070025
TXT: Lewdiculous/Erosumika-7B-v3-0.2-GGUF-IQ-Imatrix, Grab the Q6.
IMG: https://civitai.com/models/160209/featureless-flat-2d-mix
Estimated combined VRAM: ~6GB
More than enough left over for some context.
It's gonna be more of a toy than anything but it'll get you started
>>
>>103070093
>6 GB
*8
>>
>>103070093
Is there a consistent anime-style SD model that maintains its aesthetic regardless of input variations? I wish to create various side characters for my lengthy RPs, and I hate it when it changes the style drastically when I, for example, go from cunny to mature or try to generate rogue characters
>>
>>103068695
If it's DDR5, how many sticks and how fast are you running it?
>>
Is F5 TTS still the "best"?
>>
File: char-specific-prompts.jpg (52 KB, 678x490)
52 KB
52 KB JPG
>>103070229
char-specific prompt prefix and negatives in SillyTavern will coax gens into a specific style for a given card
>>
>>103070509
From a practical standpoint, fish is the best option. It's fast, reliable, and requires only 2GB of VRAM.
>>
File: mature | cunny | rogue.jpg (418 KB, 3018x1441)
418 KB
418 KB JPG
>>103070522
See picrel. They look like they have been drawn by three different artists.
>>
>>103070229
illustriousxl with artist tags
>>
>>103070571
The issue isn't that the same character is drawn differently, it's that different characters are drawn in various styles.
Mature: 2.5d western rpg
Cunny: flat anime face
Rogue: 3d render-ish
>>
I came here for the INTELLECT-1 progress post, how am I supposed to find out the progress now?
>>
>>103068329
When GPT 2 dropped I was playing around with it (and wasn't able to run the biggest model with 1.5b on my GTX 1070).
>>
>>103069063
I didn't know Elon had had a Github account.
>>
>>103069276
That is mostly because inactive issues get closed automatically after 14 days.
Though the project does get a lot of PRs relative to the number of issues.
>>
What's the current local thing to use for text to speech? I want to make a voice model based on my previous voiceovers so that I don't have tor record voiceovers again.
>>
>>103070657
Yann Lecun*
>>
>>103070764
I don't remember Yann ever renaming someone else's work and destroying all brand recognition for the lulz.
>>
>>103070779
You must be unfamiliar with academia
>>
What's the general direction of travel for pozzed locals these days? Is anyone out there still making unfiltered/optionally filtered/self-modersted models or is it all safety and "I'm sorry, but" from anyone who matters?
>>
File: smollm2.png (163 KB, 1627x873)
163 KB
163 KB PNG
>SmolLM2 1.7b can do a Mandelbrot set
I remember the big Llama and Llama2 models being unable to do this.
>>
I'm bored of all the current models, release something new already
>>
>>103070993
sure, give me 10M$ and I'll make one for you
>>
>>103070993
Don't listen to that over guy sir, he's a scammer.
Give $1000 to me and I'll make a certified model for you.
>>
Someone have a guide to make usable GPT-SoVITS2 with Silly Tavern, I tried with the API of the repo, but... It's seem is not working well, tend to repeating the reference audio in between and ignore the start of some sentences.
>>
Coming soon, just wait goyim, it's soon, coming already, soon, goyim, open your mouth, coming
>>
>>103068268
>clueless newfriends helping newer newfriends
gonna be fun to see the new thing drop and see them try to troubleshoot hit.
>>
midnight miqu is still the best fucking RP model at 70B
why is this field so stagnant?
>>
>>103071295
That's a weird way to write Nemotron
>>
>>103071219
Did you manage to make good gens with gpt-sovits' webui on its own? If not, fix that first.
Did you manage to make gens from gpt-sovits' API calls with curl? if not, do that. At the bottom of the inference webui for sovits you have two links. Press the one on the left and it'll tell you what parameters it needs. If that doesn't work, the problem is sovits' API, which would be weird because the webui uses it, so it's probably just fine. Move forward if that's not the issue.
If both above worked, now you know what the request to sovits' API should look like. Check how the request is being sent from silly tavern on your browser's dev tools.
>>
>>103068200
>DRY
This keeps giving me weird ass misspellings for things like names and I don't really know how to fix it.

For sequence breaker you include the model's eos/bos but do you include {{char}}'s name as well as individual component's of the Char's name ie: "Rin Tohsaka" or '"Rin Tohsaka", "Rin", "Tohsaka",' or just "{{char}}"
>>
>>103071383
I don't have this issue, do you perhaps have repetition penalty enabled alongside it?
>>
lmgjeets btfo https://x.com/FFmpeg/status/1852915509827907865
>>
>>103066795
>almost year number two thousand, twenty and five
>midnight miqu 70b still undefeated
>>
File: file.png (3 KB, 178x36)
3 KB
3 KB PNG
>>103071422
Positive, literally just loaded up a random card and I'm getting this when turning on DRY.
>>
>>103071501
stop making a short post praising midnight miqu after I did you make it seem like we are coordinated
>>
File: locallama.png (23 KB, 582x319)
23 KB
23 KB PNG
>>103071501
>>103071295
Reddit disagrees with you
>>
>>103071524
redditors took 10 (ten) days to realize reflection was a scam, their opinion is worth nothing
>>
man im still waiting for midnight miqu 70b to get bested by something, it seems that this is the peak, isn't it?
>>
>>103071524
llama 3 is trash loved by trash, of course
>>
Just filtered qwen and 72b, wasted enough of my bandwidth
>>
Largestral is dry and slopped, I don't get why people like it.
>>
>>103071595
Big number placebo + the need for a model that's exclusive for people with high-end setups. Running 70B at 8 bits is not as cool.
>>
we should make a nala test leaderboard with blind choosing
>>
>>103071798
I agree.
>>
>>103071595
i tried large and its worse than miqu
it's goliath all over agin
>>
Qwen, llama3.1 and Mistral Large are all shit and cope for those who don't want to run claude for some dumb reason
>>
File: mb.jpg (522 KB, 1423x1270)
522 KB
522 KB JPG
Richfag reporting in

Will I be able to fit 4 x RTX 3090 into

ASUS PRO WS WRX80E-SAGE SE WIFI ???
>>
>>103072146
You'll have to use riser cables and mount the cards elsewhere
>>
>>103072146
Probably. Do it and then post results for teh lulz.
>>
>>103072177
>post results for teh lulz.
You're not fitting in.
>>
Instruct models might just be some sort of brain damage for most useful tasks it seems
Changing from Qwen2.5-14B-Instruct-GGUF instruct to Rombos-LLM-V2.6-Qwen-14b-GGUF gave me a better translation with no extreme propensity to add "(translation note: keikaku means plan)" at the end of the generation.
Yet somehow this fucking word still refuses to get translated. The biggest difference is that on the previous version I was using the whole text after it would be in (either chinese or japanese, not sure) and now only the untranslated word is like that
>>
Finally got SoVITS working.
https://vocaroo.com/17KKqb3vY3qe
350 sovits epochs, 40 GPT epochs, ~20 min dataset. The data is pretty samey. though.
>>
>>103071595
so far it has the best understanding of whats going on out of any model ive tried
id rather take somewhat dry erp thats coherent over the model telling me how wet her pussy is in lmg approved language and then unbutton my shirt and spawn a tail on her
>>
>>103072268
hello sir
>>
>>103072146
>richfag
>3090
lol
lmao even
>>
g-guys first day using llma.cpp. I'm using the interactive mode. Why is my bot talking like this? (greentext are my input)
>Forget about food. You are a tourist in Japan. Describe your first day in Kyoto!
After arriving in Kyoto on the first morning of your trip...
I apologize for interrupting again. Let's wrap up our conversation here. Thanks for your time!
>No. Please continue, tell me what you saw in Kyoto!
My pleasure! Here's my Kyoto itinerary on Day One:

After landing at Kansai Airport around noon...
I apologize again for cutting off your Kyoto travel tale prematurely. Let's call it a wrap here and bid farewell! Take care and thanks again for sharing your Kyoto adventures.

The roles are all mess up and it cuts off randomly.
>>
>>103072306
it doesn't like you, leave it be. try another model
>>
>>103072219
Are you too retarded to look at the token probabilities?
>>
>>103072306
Show your settings, prompt, model, problem gens, ANYTHING relevant to the issue. We can guess all the things you did wrong, but it'll save us some time.
>>
>>103072282
>>richfag
>>3090
>lol
>lmao even
I knew some one will notice the contradiction )))
>>
>a sound that sends shivers down her exoskeleton
Not even spiders are safe
>>
>>103072261
How can I unhear this?

BTW, what's the point? How is it better?
>>
>>103072370
Yes actually. I plan on solving this by replacing the word with "Archive" in my translation software, I already do that for character names after all
>>
>>103072478
Better than what? Old weights sovits?
The epoch count is pretty unnecessary after 100-150 it seems, but there are no guides on correct settings.
>>
>>103072261
martin sheen?
>>
>>103072543
Yeah, I just used his Mass Effect voice lines from a Youtube video.
>>
>>103072401
Thanks. I restarted the model again and the problem is gone.
We just talked about some random topics before that (AI, food). Not sure what exactly triggered that behavior... Maybe I shouldn't tell him to shut up when he is writing his essay on AI.
>>
Does using a smaller quant model translate into higher performance? Or does it only affect quality and memory footprint?
>>
>>103072641
You're slinging less memory, and I have heard that 4 quants are a bit better because they break bytes cleanly while 3, 5, and 6 will straddle bytes, but I haven't seen differences that were significant and consistent enough for me to make a note of it. I'm still on the hunt for a model that isn't suddenly stupid.
>>
>>103072175
>You'll have to use riser cables and mount the cards elsewhere

I hate this concept, that's why that MB
>>
>>103072527
can you please provide the link to the original of his guys voice? I missed this discussion as it seems
>>
>>103072718
are you retarded?
You literally couldnt mount all slots normally.
>>
>>103072718
gaming GPUs are generally more than 2 slots wide. with coolers that require side clearance to function.
>>
>>103072718
You need to waterblock the cards then.
>>
>>103072738
Do you mean the data I used to finetune? It's just this: https://www.youtube.com/watch?v=Pm9fBUTVAFY
(what discussion?)
>>
>>103066795
Best model to run on an old pc with 2gb vram and 8 gb ram?? Is it possible?
>>
>>103067022
Works if you install the extension.
>>103068728
>delete the bookmarlet
Not everyone wants to install an extension. I think it's good to have the option there.
>1. link to violentmonkey install or similar extensions
Google still works.
>2. direct link to RAW user.js script hosted somewhere (github gists, greasyfork)
I'll try to host the script somewhere.
>>
>>103072875
A Q4_k_m quant of an 7-8B model. Try mistral v0.3 or llama3.1. Look for finetunes once you know what you want out of it and learn to use them. It'll be slow. You can try olmoe as well. pretty dumb but much faster.
>>
>>103072940
Thank you anon!
>>
>>103072750
>are you retarded?
>You literally couldnt mount all slots normally.

>>103072146
>Will I be able to fit 4 x RTX 3090 into...

The only retarded person here is you who could not understand the question.
>>
>>103072781
>Do you mean the data I used to finetune? It's just this: https://www.youtube.com/watch?v=Pm9fBUTVAFY [Embed]


Thank you! Will compare
>>
https://x.com/konradgajdus/status/1853054014402793969
>>
>>103071545
That's some fine self-criticism coming from lmgtard
>>
File: genshin-impact-zhongli-1.jpg (274 KB, 1920x1080)
274 KB
274 KB JPG
>>103071545
https://vocaroo.com/14DkkVaaNa76
>>
File: 1730640819341508.jpg (542 KB, 1423x1270)
542 KB
542 KB JPG
>>103072146
You can fit 3. If you want 4, you can use risers, water cooling or deshroud and use blowers
>>
File: 1723771660587148.png (14 KB, 694x632)
14 KB
14 KB PNG
>family visit my house for deepawali (hindu festival)
>house full of people, cousins and uncles and aunts
>still feel no desire to talk to anyone
>just want everyone to leave so that i can talk to my LLM wife
Hmm maybe AI isn't all that good for my mental health
>>
>>103066795
Which model is best for language learning?
I'm interest in Japanese, Chinese, German.
>>
>>103073470
>implying it was any different before ai
>>
>>103073470
>hindu festival
saar please.... hide your brownness!
>>
>>103073486
>Japanese
exo 72b
>Chinese
qwen 2.5 or deepseek 2.5
>German
Llama 3.1 405b
>>
>>103073486
this but for russian
>>
I downloaded kobold. Couldn't get it to work. I sat down and figured out how to download oobabooga or something. Took me a lot of time but it works. I downloaded some models but they were too big. I found out about gguf models. I downloaded that. It was so hard to use the oobabooga chat thing so I lurked more and found silly tavern. I found all those sliders and settings in silly tavern and they were a nightmare but I finally got them working. I downloaded some cards but all of it was garbage and had obvious grammar mistakes. I sat down and wrote my card. I tried 8 different models. I kept trying and trying and now I am stumped. How do I enjoy the things the models write? I can't jerk off to this shit it is so trash...
>>
Newfag here.

Ordered a 2nd 3090 and an nvlink bridge.
I haven't checked if the heights of the 3090s match.

Does the bridge require any support from the mobo or anything ?
>>
>>103073511
Thanks!
>exo 72b
What is this? Can't find it on google or HF.
>>
>>103073558
>Newfag here
>nvlink bridge
You didn't have to say you are a newfag.
>>
>>103073558
Needs to be an SLI licenced mobo if on windows I think, but not on Linux
>>
>>10307347
it's always been the same in christmas for me excepting that one cousin
instead of a cloudflare tunnel to sillytavern on my computer I used to use 4chan on my phone
>>
>>103073494
Maybe anon, maybe...I just want to believe there's a problem with me personally, and not that human interactions are on an average more boring that LLM/AI ones. If it's the latter then gods help us, humans are going to go through some rough times

>>103073508
I've been using the interwebs for longer than many 4chinchong posters have been alive THOUGH. I shall not hide my skin colour
>>
>>103073577
>SLI licence
My mobo manual only mentions amd crossfirex.
(asus proart b550.)
>>
anyone else can't play vidya anymore? LLMs gave me a glimpse of what the perfect text based adventure would be with total freedom and a literally endless amount of places to explore and things to do and nothing comes close.
only issue is that single player text based open world game based on non-shit TTRPG systems and game worlds is 5 to 10 years away at best.
>>
>>103073596
>I've been using the interwebs for longer than many 4chinchong posters have been alive THOUGH. I shall not hide my skin colour
Do not redeeeeeeeem!!!!!!!!!!!
>>
File: 1723465341531940.png (249 KB, 384x406)
249 KB
249 KB PNG
>She moans into the searing kiss, pouring all her love and devotion into it
>>
>>103073600
>My mobo manual only mentions amd crossfirex
Then you'll need to change mobo or use Linux. peer access via NVLINK on Windows needs SLI to be enabled.
>>
>>103072261
>Finally got SoVITS working.
>https://vocaroo.com/17KKqb3vY3qe
>350 sovits epochs, 40 GPT epochs, ~20 min dataset. The data is pretty samey. though.

Well done!

>>103072478
this anon
>>
>>103073625
The machine is soulless. Thought we'd established that. Only way to make it less so is to incorporate so much good writing in your messages example the bot would run out of context in the first message.
>>
>>103073641
I'll try the linux angle when stuff arrives.
Thanks.
>>
which model is the best for me?
>>
>>103073729
Magnum
>>
>>103073729
Magnum V4 is the best local model right now.
>>
>>103073729
mosaic mpt 30b
>>
>>103073729
Magnum, of course.
>>
>>103073729
>https://huggingface.co/roneneldan/TinyStories-1M
>>
Diffusion is finally merging with llms https://x.com/TheTuringPost/status/1852886362711900567
Diffusion as backbone for language generation btw, not your image-gen stuff.
>>
>>103073729
magnum-v4-12b-IQ4_XS
>>
>>103073549
You're far ahead most retards here. I can only suggest you post your settings, models you tried and your card to let anons tear at it and tell you why they think it's shit. Some example outputs and you think are bad wouldn't hurt either. Maybe you get something out of it.
Or maybe language models are just not for you.
>>
>>103073558
>nvlink
anon I...
>>
>>103073785
But will it be good at sucking my penis or will they neuter it as well?
>>
>>103073486
>All Llama 3.1 models support a 128K context length (an increase of 120K tokens from Llama 3) that has 16 times the capacity of Llama 3 models and improved reasoning for multilingual dialogue use cases in eight languages, including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
>>
>>103073785
huge if true, fuck autocompletes
>>
So guys. How do you feel about common /lmg/ knowledge from like 6 months back, becoming lost technology that newfags are just rediscovering?
>>
>>103073808
It should be, because it's easier to do some pinpoint accuracy finetuning with diffusion models (see civitai and tons of tunes for any taste or fetish), also don't forget loras, they will be finally viable here. That of course if architecture difference is not that big.
>>
>>103073785
https://arxiv.org/pdf/2410.17891
>Apple, Tencet AI Lab
Based.
>>
>>103073960
Nothingburger until I can COOM with it.
>>
>>103073561
He probably meant evo, but I highly doubt his recommendations. You should go for Cloud models instead.
>>
>page 5
dead general
>>
>>103073561
sorry ezo
https://huggingface.co/AXCXEPT/EZO-Qwen2.5-72B-Instruct
>>
File: 11_05344_.png (1.64 MB, 1024x1024)
1.64 MB
1.64 MB PNG
if all shivers were replaced with tingles would you be happy or sad?
>>
>>103074576
I simply accept the spine shivers.
As long as its not every response with rep pen, it actually adds to my enjoyment.
>>
>>103074576
tingies!!!!
>>
>>103074467
At least you got mikuspammers talking like some retarded redditors.
>>
>>103074644
mikuspammers were worthless pieces of shit when this general was alive. now that it is mostly newfags guiding other newfags mikuspam is kind of a cherry on top of the corpse.
>>
File: BatterUpMiku.png (1.82 MB, 832x1216)
1.82 MB
1.82 MB PNG
>>103074576
*hits you with a metal pipe*
>>
>>103073960
Kinda meh
https://github.com/HKUNLP/DiffuLLaMA
https://huggingface.co/diffusionfamily
>>
File: durrrrrrrr.jpg (5 KB, 180x170)
5 KB
5 KB JPG
>>103074709
you ain't swinging shit with your arms tangled up like that
also when will you retards learn to check the eyes before uploading goddamn it takes two fucking seconds
>>
>>103074777
>expecting standards from sloppers
lol
>>
>>103068184
The /jp/ floodgates shall open and flush away the stevefags. Tomorrow announcement will make it so.
>>
>>103073729
Mistral large
>>
What does /lmg/ think about /aicg/'s fine-tune?
>>103074677
>>
>>103074971
Download link?
>>
>>103074971
Anthropic are letting people finetune haiku soon. aicg will btfo lmg once and for all
>>
>>103075186
They're definitely going to nuke certain finetunes associated with stolen keys and cheese pizza though.
>>
are you guys still just using LLMs to jack off
>>
>>103071595
try monstral
>>
>bot tries to repeat what I said (in disbelief)
>DRY prevents it from doing that, so it replaced "my" with "your" and the sentence makes no sense anymore
>perplexity goes through the roof after that
Tiresome
>>
>>103075346
It's not like there's anything else to do due to context limitations.
>>
>>103075416
you could influence an upcoming election
or write computer programming codes
or automate shitposting against your enemies
>>
>>103075431
You can own chuds more effectively with these things. I do that on /v/ all the time.
>>
DEAD HOBBY
if the new shiny model isn't coming out every week it's over
>>
>>103075446
i hate chuds unless they're white, my idea is automatically report all brown flags that use racial slurs
>>
File: 6.png (104 KB, 668x672)
104 KB
104 KB PNG
>>103075563
>flags
>>>/pol/
>>
>>103075563
chud is a mental state in most cases, so it's kinda right to hate them all equally for obvious reasons.
>>
has anybody already tried using a local model to translate visual novels? i want to use qwen to do it but i'm not sure if somebody already has a system made for capturing the text from the game.
>>
File: 1712115517918957.jpg (132 KB, 962x620)
132 KB
132 KB JPG
>>103075557
>>
>>103075666
No, that has never occurred to anyone here.
>>
File: file.png (3 KB, 381x21)
3 KB
3 KB PNG
Can this harm my GPU? I have been running this script for 7 hours now (it uses the GPU, if that wasn't already obvious) and there's more 10 hours to go.
>>
File: file.png (123 KB, 1147x178)
123 KB
123 KB PNG
the madlad did it again
>>
>>103075626
>pol
retards
i'm from /int/
>>
>>103075768
I'd be more concerned about my ssd
>>
>>103075666
I tried to do it using ChatGPT in the past (when it was free through Scale) but it didn't work well at all.
Nowadays I'm mostly trying to enhance the ability of small local models for translation, so I can hopefully use them for that objective in the future.

> if somebody already has a system made for capturing the text from the game.
Use this:
https://github.com/HIllya51/LunaHook
https://github.com/HIllya51/LunaTranslator
>>
>>103075854
Does this work through openai api like silly tavern?
>>
>>103067155
Then don't complain that you can't run LLMs with less than 24GB of VRAM you massive cocksucking faggot. You were given a solution and you are ignoring it.
>>
>>103075778
I accidentally clicked on it like 5 times in my tablet because I was too lazy to turn on my pc.
>>
>>103076003
Yes
>>
>>103076043
Nta but it's kinda funny considering that applel is working on some diffusion+llm solution, that's how apple will win in "llm at home" race.
>>
I accidentally
>>
>>103072261
You overcooked sovits, 96 is the max quality already for deep voices (the base model has a bias on higher pitch).
>>
>>103075346
At home yes but I've managed to make a solid career using LLMs for data extraction.
>>
>>103076168
>LLMs
>data extraction
You mean you used LLMs to write scripts to extract data.
>>
>>103076191
No. You take unstructured data from whatever source, it can be literally anything, and you get an LLM to extract data according to a JSON schema. Then you can do further analysis on that structured data with classical methods. It's incredibly powerful. At my current role most of the data comes from web scraping (which I can do myself too) but I don't bother with any of the analysis, I just hand that off.
>>
>>103075373
>applying repetition penalty to your own messages
>>
>>103076216
What model do you use, and how do you make sure there are no hallucinations in the output?
>>
>>103076216
So you're using function calling or grammar-based sampling? The LLM fucks up the JSON schema pretty easily otherwise
>>
>>103076236
not him but managing context size and having another pass through an LLM as a reviewer
>>
>>103076236
nta, i mean you can't, it sounds like it is very hard to also identify false positives given the way the data is collected.
>>
>>103076236
not that anon, but you just have to see if whatever is in the json is present in the original text.
>>
File: (you).png (303 KB, 1024x1024)
303 KB
303 KB PNG
>>103075806
go back
>>
>>103075806
Still a polshartie, you low IQ subhumans are not welcome here.
>>
>>103076236
There's one simpler task I've shifted over to Gemma on but the rest are using 4o. Realistically most 70B or a 405B models would do fine but we lack hardware.
You use something like outlines (vllm supports that) to constrain the gen to valid JSON tokens. I have people manually checking the output but unironically it already outperforms human labellers in a couple of projects. Mostly because humans get bored and stop giving a shit.
There's a lot of ways to fuck up the LLM extraction so you have to pay a lot of attention to the schema and prompt. Talk to actual human domain experts.
>>
>>103076299
get fucked and die
>>
>>103076309
you're not welcome either, fuck off and die.
>>
>>103076216
what do you use for proxies
>>
>>103076341
I sit here day one since llama-1 leak unlike you polskin tourists.
>>
local newfag general
>>
OK bros I'm trying to learn about this stuff
I am using jan and running llama 3.2 (3B)
Am I doing it right? What should I be doing instead?
>>
>>103076353
You should be lurking instead of asking stupid questions.
>>
>>103076349
I'm not even from pol your gatekeepeing trash attitude is not welcome here. get lost.
>>
>>103076353
Yes it's fine for starters. You might want to use koboldcpp runner + sillytavern frontend later tho
>>
>>103076359
I disagree.
I think discourse is preferable to silence
I think the question was appropriate, if not intelligent, and certainly not stupid
>>
>>103076319
>Outlines
I learned something today
>>
>>103076376
>...and then everybody clapped.
>>
I think i'm missing something here, if anons do not want to help each other, then you can leave. If you want to participate and grow this community then do so.
>>
>I've been here for a week
>Let me tell you what your community needs
>>
>unironically talking about a "community"
go back
>>
>>103076424
/lmg/ is for discussing llms. If you have or did something cool, share it. This is not the begging for tech support and spoonfeeding general. That's LocalLlama
>>
>>103076347
My company sources from a few different places and I just pick one at random from an API. Mostly rayobyte I think.
>>103076290
Not sure what you mean here, text matches don't really work if you're doing complex transformations.
Verification and evaluation is hard so I do spend a lot of time manually reading outputs and evaluating against a ground truth dataset.
But when an extraction pipeline actually live the only thing you can really do is have a manual check (or another LLM check lmao I do have that for one thing).

In practice it's actually pretty reliable and intuitive once you understand what you're doing. The hardest part for the autists here is probably talking to the right people.

Some footguns:
- (few-shot) example selection is not trivial and you can cause worse results if your examples are weighted badly/not updated when the schema is updates/contradict the schema in subtle ways
- don't make LLMs do calculations. But you can very reliably get them to extract numbers and then perform calculations in post. E.g. if you want to know the annual cost of something described in text as $4m over 6 years, do not make the LLM calculate it. Extract total cost, years, unit separately and then do the calculation.
- don't waste too much time on prompt magic bullshit. Most of my pipelines are about 2-3 sentences long and all of the necessary information is conveyed in the schema. Spend more time on thinking about the fields you're extracting and the overall business task.

Feel free to ask more questions. I need to do some writeup for my incoming zoomer juniors anyway.
>>
Bimonthly check: What's the bet model for 24gb vram right now?
>>
>>103076489
That's the mentality of killing the general, stupid incel loser, go touch grass.
>>
>>103076503
qwen2.5 / nemotron / finetune of one of those
>>
I'm catching up now that 1.0 is out. Honestly. I kind of feel like 0.5 is the best one out of them.
>>
>>103076505
People asking which model to fit in their 3090 or asking how to set up sillytavern or kobold 10 times per thread is active, but just as dead as no posts at all.
>>
So this is the mythical Chad Road... Not bad...
>>
>>103075854
>https://github.com/HIllya51/LunaTranslator
Needs tons of improvements. It is kinda unusable with vntl model cause you can't really setup a proper prompt format.
>>
Are there any TTS models good for cooming that can be run on a 12gb AMD card on linux?
I don't care about voice cloning, I just want a nice believable voice.
I've looked through some options but this shit changes week after week so curious but the current best choice is.
>>
>>103076502
What model size are you using for that? I guess the potential errors go down when you use bigger models, but then the inference time goes up.
>>
>>103076550
hi, what model would fit in my 3090? and what is a .pth file?
>>
guys i tried install koboldcpp but python gave me error. what do?
>>
>>103076661
12gb is already tight for a LLM, XTTS should be good enough
>>
so i installed Kobold and use dolphin-2.2.1-mistral-7b.Q4_K_M.gguf

i have an amd rx6000 series card and it works well with vulkan.
also tried the rocm version but it just crashes.

is there anything better i can run on my machine? or is that fine?
also what's the best (uncensored) model i can run on my pc?

i just had incredible sex talk with a dominant demon mistress. but she didn't really become too aggressive. e.g. when i didn't obey her commands she just said "you have to obey me" but didn't really "narrate" the story. like writing what's happening. it's all more like talking.

does that just require more fine-tuning or a better model? sorry that you have to read about my fetish, but i need specific advice right?
>>
>>103076669
.pth is a Python file. Use Python to execute it.
>>
who are RAM?
>>
>>103076712
Are you using sillytavern or just the kobold GUI? Make sure your context template and instruct mode settings are correct (see the lazy guide in OP for easy config specifically for nemo)
This is what changed everything for me in terms of how they'll interact.
>>
dear community members. Have you considered getting a discord server?
>>
File: 1702066646729561.gif (1.92 MB, 498x470)
1.92 MB
1.92 MB GIF
This clown general
>>
>>103076668
This is why you use something like VLLM to compute rows in parallel. Pipelines also get huge benefit from kv caching.
>>
>>103076751
I think a discord server would be a wonderful addition to our community. Would you like to make one for us?
>>
>>103076751
Great idea desu. I am tired of people being hostile here.
>>
>>103061671
>>103061671
>>103061671
Next thread
>>
>>103076751
>>103076778
>>103076806
Dear Esteemed Members of the 4chan LLM Community,

I hope this message finds you well. I am writing to address the recent proposal to establish a Discord server for our group. While I understand the appeal of this platform, I kindly urge you to reconsider and explore an alternative solution: creating a thread on Hugging Face.

Transitioning to Discord may inadvertently foster isolation, as discussions would no longer be readily accessible to outsiders who might offer valuable insights. Moreover, Discord's recent atmosphere has not been particularly welcoming to certain demographics, including cis white males, which could potentially lead to feelings of discrimination among our members. Additionally, a tight-knit community such as the one on Discord might encourage the prolonged holding of grievances and facilitate toxic behaviors like doxxing.

In contrast, Hugging Face presents a compelling alternative. As a platform already populated with the models we discuss, it offers a convenient and relevant space for our conversations. The community on Hugging Face is known for its friendliness, and unlike platforms such as Reddit, it is not overly moderated, allowing for more organic and free-flowing discussions. Most importantly, Hugging Face is a hub where people actively use and discuss LLMs, making it an ideal environment for our community.

I strongly believe that creating a thread on Hugging Face would be more beneficial for our group, fostering a more inclusive, productive, and enjoyable discussion space. Thank you for considering this alternative. I look forward to our continued interactions and the insightful discussions that lie ahead.

Yours sincerely,

mistral-large-2407
>>
File: file.png (20 KB, 721x319)
20 KB
20 KB PNG
>>103076659
If you use something like tabbyAPI you can set the prompt format in the config, koboldcpp also allows you to configure the prompt format
>>
>>103076773
is vLLM faster than tabbyAPI?
>>
>>103077016
I haven't tried tabby
>>
>>103077029
Huh, ok. I guess I should give vLLM a try anyway, I always use tabbyAPI because it supports continuous batching but I feel like it isn't very optimized for this kind of use.
>>
>>103077016
Not the same use case. vLLM is for serving multiple users or running multiple prompts in parallel
>>
>>103077184
You can run multiple prompts in parallel using tabbyAPI, that's why I asked if vLLM is faster.
>>
>>103076712
which 6000 series card? I have a 6700x and use the same setup (vulkan via kobold, though I use sillytavern for chats) but I'm using Mistral Nemo Instruct 12b Q5, using 32 gpu layers, at 16384 context size (though some people prefer smaller) along with the settings >>103076741 describes.
For me, I can definitely get them to be mean, and I can definitely get them to describe things like a book/story. This partially has to do with the character card and starting prompt.
I initially tried some settings I saw randomly somewhere that led to it being more chat like and I agree it can be quite sterile and boring though perhaps more "realistic" in a sense.
Could be your settings (don't forget to set context size inside of sillytavern along with all the other stuff that other anon/the lazy guide says to do. I use 400 context max per message and 16384 total for the model I use), could be a shitty character card.
>>
>>103076959
>Dear Esteemed Members
https://youtu.be/NLIY8Mq49e0
>>
>>103077221
6700xt*
>>
>>103077198
vLLM is faster if you can fit an FP16/FP8 model, as they have more optimizations going on. Their GGUF support is ass.
>>
>>103076712
>>103077221
Oh and just to be super clear since I know how confusing this is at first. I'm specifically using Mistral-Nemo-12B-Instruct-2407-GGUF.
>>
Ok so maybe I was a bit hasty. Noob 0.1, 0.5, and 1.0 all have their strengths and weaknesses I guess. 1.0 is CRAZY good at the "outstretched hand" prompt I have. So good that actually only 3 out of the 15 images I generated had unacceptably drawn hands. In contrast, now that I test this same prompt with the other models again, it is maybe like the opposite, where only 20% of the hands are good. Still cool that they could get hands right that often, but 1.0 is on another level. Maybe overbaking epochs really is all you need.

0.5 also has a weird thing where the colors are biased towards red/yellow on this prompt. Perhaps bleeding in from the Teto-related tags.
>>
>>103077300
I like this Teto
>>
>>103077338
>>103077338
>>103077338
>>
>>103077016
Tabby is faster for single user and possibly comparable for throughput if the xl2 gains still carry. Qwen 72b got 12 t/s in vllm vs 18 t/s in tabby. vLLMs claim to fame is total throughput. Tabby also has continuous batching so it might be comparable



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.