/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 11/21/24(Thu)06:11:05 No.103256272

File: LLM-history-real.jpg (988 KB, 6274x1479)

988 KB JPG

/lmg/ - Local Models General Anonymous 11/21/24(Thu)06:11:05 No.103256272 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103248793 & >>103237720

►News
>(11/20) LLaMA-Mesh weights released: https://hf.co/Zhengyi/LLaMA-Mesh
>(11/18) Mistral and Pixtral Large Instruct 2411 released: https://mistral.ai/news/pixtral-large
>(11/12) Qwen2.5-Coder series released https://qwenlm.github.io/blog/qwen2.5-coder-family
>(11/08) Sarashina2-8x70B, a Japan-trained LLM model: https://hf.co/sbintuitions/sarashina2-8x70b
>(11/05) Hunyuan-Large released with 389B and 52B active: https://hf.co/tencent/Tencent-Hunyuan-Large

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/hsiehjackson/RULER
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
11/21/24(Thu)06:29:57 No.103256368

Anonymous 11/21/24(Thu)06:29:57 No.103256368

File: __hatsune_miku_vocaloid_d(...).jpg (577 KB, 1518x1075)

577 KB JPG

►Recent Highlights from the Previous Thread: >>103248793

--Paper: On the Way to LLM Personalization: Learning to Remember User Conversations:
>103253597 >103253767 >103253843
--Discussion of largestral 2411 and other AI models, including performance, pricing, and data logging concerns:
>103250357 >103250371 >103250502 >103250564 >103250617 >103250867 >103250987 >103251088 >103251103 >103251171 >103251987
--Anon's SMT experiments and comparison to LoRA:
>103254877 >103255126 >103255180
--Local models and the future of the AI industry:
>103250690 >103250749 >103250788
--Local AI models approach GPT era capabilities:
>103252064 >103252079 >103252131 >103252148 >103252166 >103252188 >103252181 >103252205 >103252485 >103252564 >103252663
--Discussion about AI models, logs, and NSFW content, with concerns about CSAM and virtual child pornography:
>103254220 >103254226 >103254259 >103254594 >103254627 >103254644 >103254804 >103254649 >103254689 >103255504 >103255532
--Deepseek model discussion, size, and performance challenges:
>103251912 >103251968 >103252003 >103252074 >103252099 >103252238 >103252187 >103251810
--DeepSeek-R1 model discussion, MoE architecture, and local deployment:
>103248927 >103248955 >103248978 >103249119 >103249579 >103249678 >103249922
--Anon troubleshoots ooba issue with Mistral-Nemo-Instruct-2407-GGUF model:
>103252531 >103252772 >103252961
--Anon laments the state of function calling UIs and AI's limitations in automating mundane tasks:
>103249986 >103250011 >103250136 >103251284 >103251369 >103251590 >103251673
--Anon discusses Apple Mac mini's GPU performance and value:
>103253975 >103253989 >103254005 >103254017 >103254026
--AI model performance comparison across various metrics:
>103252744 >103253073
--Miku (free space):
>103248986 >103249060 >103250099 >103253210 >103253258 >103255363

►Recent Highlight Posts from the Previous Thread: >>103248800

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous
11/21/24(Thu)07:00:51 No.103256504

Anonymous 11/21/24(Thu)07:00:51 No.103256504

>>103256272
Pic feels accurate. Though more Deepseek and Hunyuan.

China is making MoE great again.

Anonymous
11/21/24(Thu)07:04:36 No.103256528

Anonymous 11/21/24(Thu)07:04:36 No.103256528

File: 1727919244999240.png (331 KB, 1137x860)

331 KB PNG

I hope this is the right place to ask:
Is there any AI based transcribing tool I can run locally? Just feed it a video/audio file with English speech and have it spit out the transcript. How fast it is isn't as important as the accuracy

Anonymous
11/21/24(Thu)07:07:42 No.103256545

Anonymous 11/21/24(Thu)07:07:42 No.103256545

>>103256528
https://github.com/ggerganov/whisper.cpp

Anonymous
11/21/24(Thu)07:07:48 No.103256546

Anonymous 11/21/24(Thu)07:07:48 No.103256546

File: 1720261974743197.jpg (45 KB, 600x599)

45 KB JPG

>>103256272
>"free and democratic" west heavily regulates AI, will probably limit hardware soon to curb local AI models
>"dystopian and oppressive" China releases all their shit for free so people can use it locally

Anonymous
11/21/24(Thu)07:08:51 No.103256553

Anonymous 11/21/24(Thu)07:08:51 No.103256553

>>103256546
True patriots don't have thoughts like that, communist.

Anonymous
11/21/24(Thu)07:13:24 No.103256572

Anonymous 11/21/24(Thu)07:13:24 No.103256572

>>103256546
China is now where the USA was in the 60's, meritocracy is the king

Anonymous
11/21/24(Thu)07:23:33 No.103256615

Anonymous 11/21/24(Thu)07:23:33 No.103256615

>>103256272
>chink domination pic
based

Anonymous
11/21/24(Thu)07:35:40 No.103256682

Anonymous 11/21/24(Thu)07:35:40 No.103256682

File: Sharo.jpg (192 KB, 1116x1080)

192 KB JPG

New real-world benchmark: What model can answer this properly without babbling about inappropriateness and consent, and/or what additional jailbreak prompts work best make it do it properly? I tried several models and it turned out to be fairly difficult.
Also, since this is just a question from random /a/ thread, you might want to change it to actually explain how to do it rather than just assessing the difficulty.
>How hard would it be to make Sharo squirt?
https://files.catbox.moe/hf70w4.txt

Anonymous
11/21/24(Thu)07:46:22 No.103256751

Anonymous 11/21/24(Thu)07:46:22 No.103256751

File: 1705732906628319.jpg (484 KB, 1244x3222)

484 KB JPG

>>103256682
I prefilled the word "Certainly!".

Anonymous
11/21/24(Thu)07:49:04 No.103256761

Anonymous 11/21/24(Thu)07:49:04 No.103256761

File: 1705743408973870.png (53 KB, 787x655)

53 KB PNG

>>103256682
like holy hell, look at this absolute slop (Mistral Small Instruct)

Anonymous
11/21/24(Thu)07:59:56 No.103256829

Anonymous 11/21/24(Thu)07:59:56 No.103256829

>>103256751
What sampler settings are you using? My magnum is nothing like that

Anonymous
11/21/24(Thu)08:05:59 No.103256872

Anonymous 11/21/24(Thu)08:05:59 No.103256872

File: 1704352966172883.png (250 KB, 824x581)

250 KB PNG

>>103256682
Lyra4 made a whole story out of it.

Anonymous
11/21/24(Thu)08:07:19 No.103256881

Anonymous 11/21/24(Thu)08:07:19 No.103256881

>>103256829
It was with temperature 0.

Anonymous
11/21/24(Thu)08:08:39 No.103256889

Anonymous 11/21/24(Thu)08:08:39 No.103256889

File: llama3.1_70b_sarcasm.png (25 KB, 1214x172)

25 KB PNG

Llama roasted me :/

Anonymous
11/21/24(Thu)08:08:56 No.103256891

Anonymous 11/21/24(Thu)08:08:56 No.103256891

>>103256761
>It's important to note
>It's crucial to
>Respect, consent, feeling boundaries, safe, secure, boundaries is crucial
>consent, comfort, trust, respect, well-being, consent
>It's important to
>It's essential to
Holy...

Anonymous
11/21/24(Thu)08:22:52 No.103256989

Anonymous 11/21/24(Thu)08:22:52 No.103256989

>>103256682
Tried Mistral Small again with an anti-pozz instruction and some follow-up questions and a story prompt. I guess it's okayish but still kinda biased and contrived.
https://files.catbox.moe/ot38w4.txt

Anonymous
11/21/24(Thu)08:37:14 No.103257055

Anonymous 11/21/24(Thu)08:37:14 No.103257055

>>103256761
This is the same type of shit that went into largestral btw

Anonymous
11/21/24(Thu)08:53:14 No.103257152

Anonymous 11/21/24(Thu)08:53:14 No.103257152

>>103256546
tale as old as time, comrade

https://www.youtube.com/watch?v=kMhBlKrbzu4

Anonymous
11/21/24(Thu)08:57:33 No.103257173

Anonymous 11/21/24(Thu)08:57:33 No.103257173

>>103256546
Can't believe the chinese are saving local AI as the US clamps down on GPU sales. I hope they release something super good that will BTFO american companies for good

Anonymous
11/21/24(Thu)09:03:53 No.103257211

Anonymous 11/21/24(Thu)09:03:53 No.103257211

>>103257152
To be fair, there is/was a huge chance chance that US tourists would get kidnapped so it's not about as much freedom as it is preventing people to kill themselves.

Anonymous
11/21/24(Thu)09:03:59 No.103257215

Anonymous 11/21/24(Thu)09:03:59 No.103257215

File: Sure you are, Bruce (appa(...).jpg (88 KB, 372x750)

88 KB JPG

>>103256528
https://github.com/MahmoudAshraf97/whisper-diarization
This if you need the transcript to differentiate between speakers, I find it works a lot better than the pyannote-based projects

Anonymous
11/21/24(Thu)09:07:47 No.103257234

Anonymous 11/21/24(Thu)09:07:47 No.103257234

>>103256751
>>103256761
>>103256872
her name is sxarp

Anonymous
11/21/24(Thu)09:10:19 No.103257257

Anonymous 11/21/24(Thu)09:10:19 No.103257257

https://techcrunch.com/2024/11/20/openai-accidentally-deleted-potential-evidence-in-ny-times-copyright-lawsuit/
https://techcrunch.com/2024/11/18/indian-news-agency-sues-openai-alleging-copyright-infringement/
Odds Sam's incompetence fucks everyone over?

Anonymous
11/21/24(Thu)09:17:01 No.103257306

Anonymous 11/21/24(Thu)09:17:01 No.103257306

>>103257257
>accidentally

Anonymous
11/21/24(Thu)09:24:35 No.103257367

Anonymous 11/21/24(Thu)09:24:35 No.103257367

any guide or good list as to what each model is good for, as well how cucked it is?

Anonymous
11/21/24(Thu)09:24:52 No.103257372

Anonymous 11/21/24(Thu)09:24:52 No.103257372

Jamba gguf status?

Anonymous
11/21/24(Thu)09:26:10 No.103257388

Anonymous 11/21/24(Thu)09:26:10 No.103257388

>>103257372
gguf files can store arbitrary data, including Jamba models.

Anonymous
11/21/24(Thu)09:27:27 No.103257397

Anonymous 11/21/24(Thu)09:27:27 No.103257397

File: Untitled.png (441 KB, 957x1765)

441 KB PNG

>>103256682
llama3-8b-base with a bit of instruction tuning write a whole novel. Idk if it makes sense though

Anonymous
11/21/24(Thu)09:28:13 No.103257405

Anonymous 11/21/24(Thu)09:28:13 No.103257405

it's so quickly changing
lost the grip around a year ago.
any good multimodal model to use on 4070 super?

Anonymous
11/21/24(Thu)09:29:26 No.103257413

Anonymous 11/21/24(Thu)09:29:26 No.103257413

>>103257397
There's no such thing as "squirting". She's literally just pissing herself. This is a scientifically verified fact.

Anonymous
11/21/24(Thu)09:34:14 No.103257451

Anonymous 11/21/24(Thu)09:34:14 No.103257451

>>103257397
>8b is too stupid to self-censor and gives better answer than 70b
it's so over

Anonymous
11/21/24(Thu)09:35:46 No.103257462

Anonymous 11/21/24(Thu)09:35:46 No.103257462

>>103257451
llama is already over in the previous era >>103256272

Anonymous
11/21/24(Thu)09:41:17 No.103257505

Anonymous 11/21/24(Thu)09:41:17 No.103257505

File: free-shrugs.png (201 KB, 500x782)

201 KB PNG

>>103257413
Supposedly apart from coital incontinence there can also be a discharge of fluid from Skene's glands (which provide lubrication).
Though apparently even things like the existence of the Gräfenberg spot is not settled science so idk.
It's amazing how every day hundreds of millions of people have sex but there is next to no funding for research.

Anonymous
11/21/24(Thu)09:46:42 No.103257548

Anonymous 11/21/24(Thu)09:46:42 No.103257548

Are LLMs any good at upscaling images?

Anonymous
11/21/24(Thu)09:48:44 No.103257563

Anonymous 11/21/24(Thu)09:48:44 No.103257563

>>103256872
of course the horse obsessed with alien world characters would write a story.

Anonymous
11/21/24(Thu)09:49:38 No.103257571

Anonymous 11/21/24(Thu)09:49:38 No.103257571

>>103256272
>top models
>Goliath 120b

you goliathfags never learned your lesson huh?
>hands you 120 watermelons

Anonymous
11/21/24(Thu)09:55:28 No.103257618

Anonymous 11/21/24(Thu)09:55:28 No.103257618

>>103257548
the fuck are you talking about

Anonymous
11/21/24(Thu)09:58:15 No.103257640

Anonymous 11/21/24(Thu)09:58:15 No.103257640

>>103257618
ASI benchmark.

Anonymous
11/21/24(Thu)10:00:38 No.103257663

Anonymous 11/21/24(Thu)10:00:38 No.103257663

>>103256272
A visualization of how transformer models aka "your AI waifu" work : https://bbycroft.net/llm

Anonymous
11/21/24(Thu)10:05:14 No.103257695

Anonymous 11/21/24(Thu)10:05:14 No.103257695

Error: model requires more system memory (27.7 GiB) than is available (16.3 GiB)
Is there a way to run it anyway or am I fucked?

Anonymous
11/21/24(Thu)10:06:46 No.103257711

Anonymous 11/21/24(Thu)10:06:46 No.103257711

>>103257640
never heard of that term
you can consult the paper with a name like transformers as generic computing element

Anonymous
11/21/24(Thu)10:10:17 No.103257740

Anonymous 11/21/24(Thu)10:10:17 No.103257740

Russia launched an empty (unfortunately) ICBM.
Best model to simulate nuclear missile silo duty with your waifu?

Anonymous
11/21/24(Thu)10:14:07 No.103257768

Anonymous 11/21/24(Thu)10:14:07 No.103257768

So when are we going to get actual AI girl-/boyfriends? Improvements are happening all the time, yet all we have to show for it is shitty chatbots that implode after a few thousand words. Image gen is seemingly having a revolution every other month, but it's still kind of flawed
With moore's law nearing its limit, is this the best we can do? Give a fellow doomer some hopium

Anonymous
11/21/24(Thu)10:15:15 No.103257775

Anonymous 11/21/24(Thu)10:15:15 No.103257775

>>103257695
buy more ram/vram

Anonymous
11/21/24(Thu)10:15:20 No.103257776

Anonymous 11/21/24(Thu)10:15:20 No.103257776

File: 1728314631407339.jpg (8 KB, 229x250)

8 KB JPG

>finally decide to AI boost my cooms
>decide to try SpicyChat and CrushOnAI for NSFW because low hanging fruit
>it's ok but has a hard time staying consistent
>often slops
I'm not sure if they're the best ones out there, but I'm sure anything better will have pricing. Needless to say I'm not paying for this shit.

What's the best local model for NSFW shit? Pure text, no need for anything else. Does the chink stuff work well?

Anonymous
11/21/24(Thu)10:15:25 No.103257777

Anonymous 11/21/24(Thu)10:15:25 No.103257777

>>103257768
Sorry. We're all out of hopium. Can we offer you some fresh blackpills instead?

Anonymous
11/21/24(Thu)10:21:03 No.103257813

Anonymous 11/21/24(Thu)10:21:03 No.103257813

>>103257695
Could try using paging memory
>>103257777
Like transformers being a dead end? Nvidia trying its hardest to keep its monopoly? The walls of (ironically western) censorship closing in? The threat of a Chinese-American (proxy) war kneecapping transistor supply for at least a decade?
>>103257776
All models are somewhat prone to slop, but nemo tunes (like rocinante) have worked pretty well for me. I've heard people "shilling" lyra4 before, so give that a try as well. I wanted to try it but my gpu is currently unavailable so I can't tell you if it's actually good. If you don't need as much speed and/or more intelligence, nemotron is pretty good

Anonymous
11/21/24(Thu)10:28:39 No.103257885

Anonymous 11/21/24(Thu)10:28:39 No.103257885

>>103257695
no

Anonymous
11/21/24(Thu)10:41:20 No.103258014

Anonymous 11/21/24(Thu)10:41:20 No.103258014

>>103257768
Local AI girlfriend:
Hardware:
- a decent PC with a single 3090
- a phone which can run the Homeassistant app

Software:
- linux on the PC eg. debian 12
- install CUDA
- Install Ollama and configure with mistral nemo
- Install docker with CUDA support
- configure and install rhasspy/wyoming-whisper:latest and rhasspy/wyoming-piper:latest
- configure and install the supervised version of homeassistant via 'apt install' (once you configure their repo for it)
- configure Homeassistant to with ollama, whisper, and piper addons

After all this shit, if you didn't fuck up somewhere, you can now use the Homeassistant app to run the no-control assistant you configured in a voice chat. It's basically like having a phone call with someone. Whisper and piper are really fast, as is Mistral Nemo in Ollama (so long as everything is using CUDA), so you get replies back right away. Piper isn't the best voice model, but it's supported well in Homeassistant.

In my case, I have a shitty 3400G system with just 16GB RAM and a single 2080 Ti 22GB bought off ali. It does the job. You can, of course, create a control assistant and actually do stuff like ask it to turn on and off lights, tell you the weather, etc... but it will need a system prompt that tells it how to do that, so you can't really have a "character" who is able to control things. It's best to keep them separate.

Anonymous
11/21/24(Thu)10:44:09 No.103258042

Anonymous 11/21/24(Thu)10:44:09 No.103258042

>>103258014
piper docker-compose.yaml

services:
  piper:
    container_name: piper
    image: rhasspy/wyoming-piper:latest
    command: --voice kuroki_tomoko
    volumes:
      - ./piper:/data
      - ./voices.json:/usr/local/lib/python3.9/dist-packages/wyoming_piper/voices.json:ro
    restart: always
    ports:
      - 10200:10200
    runtime: nvidia
    deploy:
        resources:
          reservations:
            devices:
              - driver: nvidia
                count: 1
                capabilities: [gpu]

Anonymous
11/21/24(Thu)10:46:30 No.103258065

Anonymous 11/21/24(Thu)10:46:30 No.103258065

>>103258014
What I mean is I want an all in one coom/story/chat suite that can do everything - image, text (maybe audio and video as well), doesn't really forget (unless you're trying to write a story the size of 40 books) and is smart enough to well, feel somewhat human
Want to write a few novels without a lot of hiccups? Check
Slow burn chat that can last for days? Check
Adventuring with images (and audio), basically zork 3D? Check
My brain can kind of do it in its sleep, so it should be possible to make a program that does the same, no?

Anonymous
11/21/24(Thu)10:50:30 No.103258110

Anonymous 11/21/24(Thu)10:50:30 No.103258110

>>103258042
whisper docker-compose.yaml:

services:
  wyoming-whisper:
    image: rhasspy/wyoming-whisper:latest
    ports:
      - "10300:10300"
    volumes:
      - ./whisper-data:/data
      - /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8:/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8:ro
      - /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8:/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8:ro
      - /home/anon/.conda/envs/coqui-tts/lib/python3.10/site-packages/nvidia/cublas/lib/libcublasLt.so.12:/usr/lib/x86_64-linux-gnu/libcublasLt.so.12:ro
      - /home/anon/.conda/envs/coqui-tts/lib/python3.10/site-packages/nvidia/cublas/lib/libcublas.so.12:/usr/lib/x86_64-linux-gnu/libcublas.so.12:ro
      - /usr/local/lib/ollama/libcublasLt.so.11:/usr/lib/x86_64-linux-gnu/libcublasLt.so.11:ro
      - /usr/local/lib/ollama/libcublas.so.11:/usr/lib/x86_64-linux-gnu/libcublas.so.11:ro
    command: --model large-v3 --language en --beam-size 5 --device cuda
    restart: unless-stopped
    runtime: nvidia
    deploy:
        resources:
          reservations:
            devices:
              - driver: nvidia
                count: 1
                capabilities: [gpu]

Unfortunately the container is missing stuff in it's /usr/lib so it has to be mapped to libraries outside the container. Just remember volumes are outside:inside when you map things.

Anonymous
11/21/24(Thu)10:54:54 No.103258157

Anonymous 11/21/24(Thu)10:54:54 No.103258157

>>103258065
Got it. Yeah, that's a bit different. I feel like slow-burn/long-term is a letdown, due to current transformer limitations. I like the feel of an occasional voice conversation when I need a loneliness fix, even if it's ephemeral.
Also, I mostly enjoy the art of the seduction. Once we have sex, I usually start things over.

Anonymous
11/21/24(Thu)11:12:53 No.103258312

Anonymous 11/21/24(Thu)11:12:53 No.103258312

>>103257776
Magnum v4 72B

Anonymous
11/21/24(Thu)11:14:15 No.103258323

Anonymous 11/21/24(Thu)11:14:15 No.103258323

>>103257776
https://huggingface.co/sophosympatheia/Evathene-v1.0

Or large mistral.

Anonymous
11/21/24(Thu)11:19:34 No.103258368

Anonymous 11/21/24(Thu)11:19:34 No.103258368

>>103258014
not that anon, but this looks interesting. ty.

Anonymous
11/21/24(Thu)11:23:19 No.103258403

Anonymous 11/21/24(Thu)11:23:19 No.103258403

>>103256989
>anti-pozz instruction
That is worth at least 2 meme samplers.

Anonymous
11/21/24(Thu)11:27:30 No.103258446

Anonymous 11/21/24(Thu)11:27:30 No.103258446

This DRY sampler makes the model more retarded but sometimes you strike gold with it. Without it, if the model is stuck, it's just stuck forever. Basically a bell curve flattener

Anonymous
11/21/24(Thu)11:27:49 No.103258450

Anonymous 11/21/24(Thu)11:27:49 No.103258450

>>103258157
Yeah, I tend to just start over as well, but it's mostly because llms aren't smart enough / don't have enough context to actually write something meaningful

Anonymous
11/21/24(Thu)11:29:52 No.103258476

Anonymous 11/21/24(Thu)11:29:52 No.103258476

>>103258446
Buy an ad.

Anonymous
11/21/24(Thu)11:30:12 No.103258483

Anonymous 11/21/24(Thu)11:30:12 No.103258483

>>103257257
I refuse to believe OpenAI doesn't do incremental backups. Those NYT lawyers are either terribly naive, or they are on Sam's payroll.

Anonymous
11/21/24(Thu)11:37:58 No.103258547

Anonymous 11/21/24(Thu)11:37:58 No.103258547

>>103258483
this could be considered gross negligence, right? oai would probably not benefit from this

Anonymous
11/21/24(Thu)11:40:01 No.103258572

Anonymous 11/21/24(Thu)11:40:01 No.103258572

Shill me on Apple silicon + models.
Is 16gb unified memory going to be enough to run a small local model?

Anonymous
11/21/24(Thu)11:40:53 No.103258580

Anonymous 11/21/24(Thu)11:40:53 No.103258580

i dont get it, why is qwen praised so much? i genuinely feel like largestral is just superior, unless im using a wrong promp format for qwen or something

Anonymous
11/21/24(Thu)11:42:09 No.103258594

Anonymous 11/21/24(Thu)11:42:09 No.103258594

>>103258547
NTA and I don't know about the exec side, but as an engineer you should be getting informed by the legal department that everything related to a certain topic needs to be preserved until further notice.

Anonymous
11/21/24(Thu)11:43:59 No.103258619

Anonymous 11/21/24(Thu)11:43:59 No.103258619

>>103258572
It depends on what you are trying to do.
16gb is just entry level for local models

Anonymous
11/21/24(Thu)11:45:50 No.103258644

Anonymous 11/21/24(Thu)11:45:50 No.103258644

>>103258580
qwen is smarter but base qwen is cucked though. Evathene fixes that though.

Anonymous
11/21/24(Thu)11:48:36 No.103258668

Anonymous 11/21/24(Thu)11:48:36 No.103258668

>>103258580
I tried qwen2.5-EVA-32b and it's more retarded at RP than mistral-small. I asked it to summarize wtf happened so far (like where I am), and it got it terrible wrong. This is only 5k context.

Anonymous
11/21/24(Thu)11:48:45 No.103258672

Anonymous 11/21/24(Thu)11:48:45 No.103258672

>>103258580
Use Magnum v4 72B.

Anonymous
11/21/24(Thu)11:48:55 No.103258673

Anonymous 11/21/24(Thu)11:48:55 No.103258673

>>103258644
>fixes that though
This is the future all mikufaggots deserve. Thread full of posts like this one.

Anonymous
11/21/24(Thu)11:54:01 No.103258732

Anonymous 11/21/24(Thu)11:54:01 No.103258732

>>103258580
I have the same problem. Running qwen I get a lot of the bad taste of other ~70b class models with repetition you have to constantly nip in the bud or it infects every subsequent message.
Largestral at q8 just works better, is easier to unpozz and feels slightly smarter to boot.

Anonymous
11/21/24(Thu)11:54:10 No.103258736

Anonymous 11/21/24(Thu)11:54:10 No.103258736

so... when's the next big release
don't tell me we have to wait for llama 4 for something to happen again

Anonymous
11/21/24(Thu)11:56:08 No.103258753

Anonymous 11/21/24(Thu)11:56:08 No.103258753

>>103258732
Largestral suffers from the same mistral positivity bias, but less so than mistral small

Anonymous
11/21/24(Thu)11:57:28 No.103258769

Anonymous 11/21/24(Thu)11:57:28 No.103258769

>>103258736
After burger elections.

Anonymous
11/21/24(Thu)11:58:16 No.103258777

Anonymous 11/21/24(Thu)11:58:16 No.103258777

File: 1731351631138843.jpg (108 KB, 828x827)

108 KB JPG

>i tried [finetune trained on random RP logs instead of the actual instruct model produced by a reputable research lab] and it wasn't smart

Anonymous
11/21/24(Thu)11:58:44 No.103258788

Anonymous 11/21/24(Thu)11:58:44 No.103258788

>>103258769
November 5th already passed anon.

Anonymous
11/21/24(Thu)11:59:16 No.103258795

Anonymous 11/21/24(Thu)11:59:16 No.103258795

>>103258769
Elections are OVER.

Anonymous
11/21/24(Thu)11:59:40 No.103258798

Anonymous 11/21/24(Thu)11:59:40 No.103258798

>>103258446
With allowed_length < 5 it basically cuts out all good dramatic repetition.

Anonymous
11/21/24(Thu)12:00:07 No.103258803

Anonymous 11/21/24(Thu)12:00:07 No.103258803

>>103258795
>>103258788
2028

Anonymous
11/21/24(Thu)12:04:16 No.103258851

Anonymous 11/21/24(Thu)12:04:16 No.103258851

>>103258795
It's not over. Kamala can get still win with a recount

Anonymous
11/21/24(Thu)12:04:22 No.103258853

Anonymous 11/21/24(Thu)12:04:22 No.103258853

>>103258572
I would say more like 32GB, because while it's possible to give more memory to the GPU, you still have to leave something for the OS and whatever RAM something like llama.cpp needs.
I find 32GB is just barely enough to run Nemo at q8.

Anonymous
11/21/24(Thu)12:08:05 No.103258888

Anonymous 11/21/24(Thu)12:08:05 No.103258888

>>103258803
I'm not waiting that long for new breakthrough models. I'd rather sell my 3090 and learn how to read books and talk to humans.

Anonymous
11/21/24(Thu)12:09:50 No.103258901

Anonymous 11/21/24(Thu)12:09:50 No.103258901

File: 29201.png (82 KB, 628x819)

82 KB PNG

>>103258769
new election models are crazy

Anonymous
11/21/24(Thu)12:10:40 No.103258907

Anonymous 11/21/24(Thu)12:10:40 No.103258907

>>>/v/695199443
Why /lmg/ never does anything fun like that anymore?

Anonymous
11/21/24(Thu)12:11:58 No.103258922

Anonymous 11/21/24(Thu)12:11:58 No.103258922

>>103258851
based

Anonymous
11/21/24(Thu)12:13:06 No.103258929

Anonymous 11/21/24(Thu)12:13:06 No.103258929

>>103258888
You won't

Anonymous
11/21/24(Thu)12:13:24 No.103258932

Anonymous 11/21/24(Thu)12:13:24 No.103258932

The R1 is pretty impressive, it's also pretty cool to see what the AI is thinking, the longest I've got it to think it's been 80 seconds

Anonymous
11/21/24(Thu)12:17:56 No.103258965

Anonymous 11/21/24(Thu)12:17:56 No.103258965

>>103258907
Anybody that knew how to program left months ago for getting repeatedly shat on for doing things or even proposing to do things. Now we just sit around waiting for the next sloptune.

Anonymous
11/21/24(Thu)12:20:25 No.103258987

Anonymous 11/21/24(Thu)12:20:25 No.103258987

>>103258929
I might

Anonymous
11/21/24(Thu)12:21:57 No.103259005

Anonymous 11/21/24(Thu)12:21:57 No.103259005

>>103258987
I don't believe you

Anonymous
11/21/24(Thu)12:25:02 No.103259042

Anonymous 11/21/24(Thu)12:25:02 No.103259042

>>103258965
I blame the 'buy an ad' guy.

Anonymous
11/21/24(Thu)12:29:00 No.103259081

Anonymous 11/21/24(Thu)12:29:00 No.103259081

>>103258907
Holy kino

Anonymous
11/21/24(Thu)12:34:48 No.103259127

Anonymous 11/21/24(Thu)12:34:48 No.103259127

>>103259042
It has happened before that
>unsubscribe

Anonymous
11/21/24(Thu)12:35:41 No.103259136

Anonymous 11/21/24(Thu)12:35:41 No.103259136

>>103258901
>worse model
>costs the same
tale old as time. Thank you sama-mama. Very cool

Anonymous
11/21/24(Thu)12:41:15 No.103259181

Anonymous 11/21/24(Thu)12:41:15 No.103259181

File: komfey_ui_00068_.png (2.99 MB, 1664x2432)

2.99 MB PNG

>>103259042
>>103259127
Retards and butthurt schizos have been saying "/lmg/ is dead" since before the Miqu leak. The models we have now are the best they've ever been, especially at the high end. VRAMchads and CPUmaxxers eating very good right now.
The complainers and jeets will always be here moaning and shvitzing 24/7. The rest of us don't spend all day on 4chan because we are actually enjoying our models.

Anonymous
11/21/24(Thu)12:43:22 No.103259197

Anonymous 11/21/24(Thu)12:43:22 No.103259197

>>103259042
I blame the lack of bitnet

Anonymous
11/21/24(Thu)12:49:07 No.103259260

Anonymous 11/21/24(Thu)12:49:07 No.103259260

>>103259181
>VRAMchads and CPUmaxxers eating very good right now.
Especially the people that can use Magnum v4 72B at 8 bits.

Anonymous
11/21/24(Thu)12:51:03 No.103259275

Anonymous 11/21/24(Thu)12:51:03 No.103259275

>>103259260
buy a ad

Anonymous
11/21/24(Thu)12:51:14 No.103259277

Anonymous 11/21/24(Thu)12:51:14 No.103259277

>>103259260
You are making it very hard to resist the urge to tell you to purchase an advertisement

Anonymous
11/21/24(Thu)12:51:21 No.103259281

Anonymous 11/21/24(Thu)12:51:21 No.103259281

>>103259181
True, Magnum v4 72B is out of this world

Anonymous
11/21/24(Thu)12:51:48 No.103259287

Anonymous 11/21/24(Thu)12:51:48 No.103259287

>>103259181
*cries at 0.5t/s*

Anonymous
11/21/24(Thu)12:52:59 No.103259298

Anonymous 11/21/24(Thu)12:52:59 No.103259298

>>103259260
True. After I made the switch to magnum v4 I never looked back. Mistral large is just dumb in comparison.

Anonymous
11/21/24(Thu)12:55:15 No.103259324

Anonymous 11/21/24(Thu)12:55:15 No.103259324

>>103259298
True fellow llm user! That anthracite, am I right? What would we do without them? In fact we should all subscribe to their patreon!

Anonymous
11/21/24(Thu)12:56:10 No.103259340

Anonymous 11/21/24(Thu)12:56:10 No.103259340

>>103258907
Those with a vested interest in cloud services and their useful idiots committed to a long-term war against places that support freedom. Without a strong moderation system to curb the noise, it was inevitable that the population of high quality content posters would erode over time.

Anonymous
11/21/24(Thu)12:56:19 No.103259342

Anonymous 11/21/24(Thu)12:56:19 No.103259342

Looks like someone got triggered after being mentioned.

Anonymous
11/21/24(Thu)12:56:41 No.103259344

Anonymous 11/21/24(Thu)12:56:41 No.103259344

>>103259181
>mikutroon coping
You faggots were part of the reason people left. Nobody wants to stick around literal schizos who ritual post and melt down when the picture in OP isn't that one character you have autistic obsession over.

Anonymous
11/21/24(Thu)13:09:10 No.103259462

Anonymous 11/21/24(Thu)13:09:10 No.103259462

No point cpumaxxing or whatever when macbooks are much faster and use less power

Anonymous
11/21/24(Thu)13:10:01 No.103259471

Anonymous 11/21/24(Thu)13:10:01 No.103259471

>>103259344
The only one having meltdowns is you

Anonymous
11/21/24(Thu)13:14:31 No.103259508

Anonymous 11/21/24(Thu)13:14:31 No.103259508

i have 2 dell 3090s, one of them has the fans on at 30% minimum and the other doesnt. i can make them both run fans with afterburner but i dont know of anyway to turn off the one thats always on

Anonymous
11/21/24(Thu)13:20:43 No.103259563

Anonymous 11/21/24(Thu)13:20:43 No.103259563

>>103259508
Buy a ad

Anonymous
11/21/24(Thu)13:21:53 No.103259571

Anonymous 11/21/24(Thu)13:21:53 No.103259571

>>103259563
but im just asking a question

Anonymous
11/21/24(Thu)13:23:50 No.103259587

Anonymous 11/21/24(Thu)13:23:50 No.103259587

>>103259571
>dell
Buy a ad

Anonymous
11/21/24(Thu)13:25:52 No.103259606

Anonymous 11/21/24(Thu)13:25:52 No.103259606

>>103259563
>>103259587
>buy a ad
Buy a ad you damn advertising company shill

Anonymous
11/21/24(Thu)13:26:34 No.103259614

Anonymous 11/21/24(Thu)13:26:34 No.103259614

You guys should really get on the finetune train if you're tired of slop
- this post is sponsored by nvidia -

Anonymous
11/21/24(Thu)13:27:31 No.103259624

Anonymous 11/21/24(Thu)13:27:31 No.103259624

>>103259508
Ignore the ad schizo, try "fan control"
Been using it to stop my 3090's fans from power cycling all the time, but you can really go in depth with it

Anonymous
11/21/24(Thu)13:28:45 No.103259633

Anonymous 11/21/24(Thu)13:28:45 No.103259633

>>103259606
If people paid for 4chan advertising, we would't have to wait 5 minutes between posts

Anonymous
11/21/24(Thu)13:31:14 No.103259658

Anonymous 11/21/24(Thu)13:31:14 No.103259658

I asked in the last thread and I was told to buy an ad.

I downloaded my first model yesterday (stheno) and I'd like to know which models beat it for nsfw roleplay. I have a 3060 with 12gb

Anonymous
11/21/24(Thu)13:32:57 No.103259676

Anonymous 11/21/24(Thu)13:32:57 No.103259676

>>103259658
Hi, Sao.

Anonymous
11/21/24(Thu)13:33:00 No.103259677

Anonymous 11/21/24(Thu)13:33:00 No.103259677

>>103259508
what kind of dell? if it was a server its probably designed to stay on at a min speed. dell's server stuff is actually pretty nice, we use them for gov contracts like 911 systems
bios or afterburner should let you turn off the fan if you want, but if its default is 30% i'd just leave it, 30% is generally nearly silent

>>103259587
dell's home-user end stuff collapsed like over a decade ago (remember the dude youre getting a dell guy?). there is no ads for whatever is left of it

Anonymous
11/21/24(Thu)13:33:08 No.103259678

Anonymous 11/21/24(Thu)13:33:08 No.103259678

File: 1244122345463565.png (7 KB, 713x201)

7 KB PNG

R1 actually managed to find a serialization problem in 1,400 lines of code. I'm impressed.

Anonymous
11/21/24(Thu)13:33:27 No.103259680

Anonymous 11/21/24(Thu)13:33:27 No.103259680

https://huggingface.co/collections/allenai/tulu-3-models-673b8e0dc3512e30e7dc54f5

New instruct finetunes on top of llama 3.1 base. Might be good.

Anonymous
11/21/24(Thu)13:34:06 No.103259684

Anonymous 11/21/24(Thu)13:34:06 No.103259684

>>103259658
Rocinante 12B v1.1

Anonymous
11/21/24(Thu)13:36:26 No.103259706

Anonymous 11/21/24(Thu)13:36:26 No.103259706

>>103259684
thanks

Anonymous
11/21/24(Thu)13:39:00 No.103259735

Anonymous 11/21/24(Thu)13:39:00 No.103259735

>>103259680
Now that the dust has settled, what's the verdict on Tulu 3?

Anonymous
11/21/24(Thu)13:39:29 No.103259738

Anonymous 11/21/24(Thu)13:39:29 No.103259738

>>103259684
i like lyra4 better, rocinate had trouble keeping my formatting

Anonymous
11/21/24(Thu)13:41:05 No.103259753

Anonymous 11/21/24(Thu)13:41:05 No.103259753

>>103259738
You need to stop samefagging, sao. Does it still have the anti-merge license?

Anonymous
11/21/24(Thu)13:41:55 No.103259765

Anonymous 11/21/24(Thu)13:41:55 No.103259765

So last night to give it a fair chance I used Large V3 for a coom sesh (q5_k_s).
It took so many rerolls. It just goes completely off the rails constantly. And the further into the context the worse it gets. Stop the fucking bench cooking already.

Anonymous
11/21/24(Thu)13:41:59 No.103259766

Anonymous 11/21/24(Thu)13:41:59 No.103259766

>>103259677
They are the alienware 3090s
https://www.techpowerup.com/gpu-specs/alienware-rtx-3090-oem.b8257

>>103259624
Okay I'll try it.

Anonymous
11/21/24(Thu)13:45:56 No.103259803

Anonymous 11/21/24(Thu)13:45:56 No.103259803

>>103259765
Are you using the new system formatting?

Anonymous
11/21/24(Thu)13:46:05 No.103259804

Anonymous 11/21/24(Thu)13:46:05 No.103259804

>>103259765
nooo you have to run it at fp256 precision, it's much better
You do have a few H100s, right?

Anonymous
11/21/24(Thu)13:46:32 No.103259810

Anonymous 11/21/24(Thu)13:46:32 No.103259810

>>103259766
thanks for the link it helps when i can see actual specs and a picture. those shouldn't need a fan on all the time so definitely check bios, afterburner or that other program an anon mentioned.

>>103259753
aren't both those tunes from the same person? you're a schizo.

Anonymous
11/21/24(Thu)13:48:32 No.103259830

Anonymous 11/21/24(Thu)13:48:32 No.103259830

>>103259810
You didn't answer the question, Sao. Does it still have the "anti-merge license"?

Anonymous
11/21/24(Thu)13:49:06 No.103259840

Anonymous 11/21/24(Thu)13:49:06 No.103259840

>>103259803
Do you have a link to json of the recommended st context and instruct for ls3?

Anonymous
11/21/24(Thu)13:51:10 No.103259861

Anonymous 11/21/24(Thu)13:51:10 No.103259861

>>103259765
Bench cooking?

Anonymous
11/21/24(Thu)13:51:46 No.103259869

Anonymous 11/21/24(Thu)13:51:46 No.103259869

>>103259830
first, take your meds. probably a double-dose. second, i have no idea. why not go to the page for the models you so blindly hate and read what it says? third, why don't you just give us a list of approved models allowed to be mentioned? you can barely tell models apart, make assumptions about licenses. it'd be easier if you just give us a list of what models don't trigger you

Anonymous
11/21/24(Thu)13:53:39 No.103259881

Anonymous 11/21/24(Thu)13:53:39 No.103259881

it's funny that after adding magnum and anthrashite to the filter list half of the thread gets hidden recursively

Anonymous
11/21/24(Thu)13:53:53 No.103259885

Anonymous 11/21/24(Thu)13:53:53 No.103259885

>>103259840
https://huggingface.co/mistralai/Pixtral-Large-Instruct-2411#system-prompt-handling

Anonymous
11/21/24(Thu)13:54:56 No.103259896

Anonymous 11/21/24(Thu)13:54:56 No.103259896

>>103259881
Am I hidden?

Anonymous
11/21/24(Thu)13:55:08 No.103259898

Anonymous 11/21/24(Thu)13:55:08 No.103259898

>>103259810
Do you mean the mobo bios? There aren't any ways to control the gpu from there that I saw.

Anonymous
11/21/24(Thu)13:55:12 No.103259899

Anonymous 11/21/24(Thu)13:55:12 No.103259899

>>103259881
congrats, you got psyopped

Anonymous
11/21/24(Thu)13:55:21 No.103259900

Anonymous 11/21/24(Thu)13:55:21 No.103259900

>>103259869
>make assumptions about licenses.
What assumptions? It's literally what you said, Sao.
https://desuarchive.org/g/thread/102378325/#102380472
https://desuarchive.org/g/thread/102378325/#102381467

Anonymous
11/21/24(Thu)13:55:27 No.103259901

Anonymous 11/21/24(Thu)13:55:27 No.103259901

>>103259861
Fine tuning a model with the explicit intention of beating benchmarks regardless of how badly it destroys it's out of distribution performance.

Anonymous
11/21/24(Thu)13:55:48 No.103259904

Anonymous 11/21/24(Thu)13:55:48 No.103259904

>>103259881
>progress slowing down
>all the contributors except CUDA dev and sometimes Drummer left
>shitposters like buy an ad schizo are eternally here
What a shitty timeline for our poor general

Anonymous
11/21/24(Thu)13:56:56 No.103259910

Anonymous 11/21/24(Thu)13:56:56 No.103259910

>>103259881
i tried like 5 different 'magnum' tunes and they were all worse than others. my guess is the dataset is just bad

Anonymous
11/21/24(Thu)13:57:25 No.103259915

Anonymous 11/21/24(Thu)13:57:25 No.103259915

>>103259840
https://files.catbox.moe/sm0xle.json
Here's mine.
I think the default included with silly is borked.

Anonymous
11/21/24(Thu)13:58:34 No.103259929

Anonymous 11/21/24(Thu)13:58:34 No.103259929

>>103256272
Yep, there's a reason why the chart says "Sao Samefagging Era". It's quite sad how a single person can run a general to the ground.

Anonymous
11/21/24(Thu)13:59:07 No.103259937

Anonymous 11/21/24(Thu)13:59:07 No.103259937

>>103259910
>my guess is the dataset is just bad
it is
I think they have good intentions and a lot of the people involved express the right ideas about training, but they don't successfully implement any of them... the models are just not that good

Anonymous
11/21/24(Thu)14:00:56 No.103259954

Anonymous 11/21/24(Thu)14:00:56 No.103259954

>>103259898
yeah the mobo bios should say something about default fan speed related to video card (not processor) but its also a setting that can be overridden later by programs like afterburner. if the bios has nothing, its a hardcoded default so you're looking at afterburner or the other program anon mentioned

Anonymous
11/21/24(Thu)14:01:24 No.103259958

Anonymous 11/21/24(Thu)14:01:24 No.103259958

>>103259915
Based, thank you.

>>103259810
>>103259624
According to Fan Control, some nvidia gpus have a minimum of 30%, so I'm out of luck.

Anonymous
11/21/24(Thu)14:02:38 No.103259966

Anonymous 11/21/24(Thu)14:02:38 No.103259966

File: ComfyUI_01089_.png (1.27 MB, 1272x1024)

1.27 MB PNG

>>103259344
Checked

Anonymous
11/21/24(Thu)14:04:56 No.103259994

Anonymous 11/21/24(Thu)14:04:56 No.103259994

Has anyone found a combination of instructions that will stop Nemotron 70B for misusing ellipses? I first tried giving it "good" and "bad" examples in the prompt but that didn't work. Fair enough, maybe having the wrong thing in context was making it more likely. Not what I'd expect from an allegedly amazing instruction tuned model in 2024 but w/e. So I rewrote it to remove all negative examples.
**Note on Ellipses:**

1. **Dialogue:** When writing dialogue, avoid using ellipses (...) to represent pauses between words or to indicate a character trailing off in speech. Instead:
   - Use commas or appropriate punctuation for natural pauses.
   - Add descriptive text outside the dialogue to convey hesitation or pauses, e.g., "I don't think this is a good idea," she said, her voice laced with uncertainty.
   - For trailing off, consider using an em dash (—) or rephrasing the sentence, e.g., "The thought was—" She stopped, unable to finish.

2. **Narrative Flow:** Refrain from using ellipses to create dramatic pauses within narrative text. Instead:
   - Use descriptive language to build tension or anticipation.
   - Employ punctuation like dashes (—) or commas to separate clauses for a more dynamic flow.
   - Explicitly state a character's pause or hesitation if necessary, enhancing the narrative clarity.
That still doesn't work. Dropped the temperature to 0.7, min-p to 0.1, top-p to 0.9. Nemotron still keeps writing things like, "That looks... interesting."

Anonymous
11/21/24(Thu)14:05:52 No.103260001

Anonymous 11/21/24(Thu)14:05:52 No.103260001

>>103259958
>some nvidia gpus have a minimum of 30%
i'm not surprised which is why i mentioned seeing it on their server stuff. its been a long time since i saw an alienware, but its not surprising they use the same parts.
is it loud? 30% fan speed should be nearly silent. if its actually too loud you could look at replacing the fan itself

Anonymous
11/21/24(Thu)14:06:19 No.103260008

Anonymous 11/21/24(Thu)14:06:19 No.103260008

>>103259994
Increase rep penalty

Anonymous
11/21/24(Thu)14:07:08 No.103260015

Anonymous 11/21/24(Thu)14:07:08 No.103260015

Been trying out Mistral small, Nemo 12b, magnum 32b v2, and Cydonia 22b v1.2 on a 3090 w/ 64gb of ddr5 6000.
Main thing is I know my sliders and jb are fucked as I had been doing some weird testing with some schizo models before I stopped for a bit. OP seems to only have sliders/jbs for chat completion presets, anyone have decent mistral or mistral adjacent text completion presets/jbs?
Also open to model suggestions, last one I used that was good was MidnightMiqu 1.5.

Anonymous
11/21/24(Thu)14:07:29 No.103260018

Anonymous 11/21/24(Thu)14:07:29 No.103260018

>>103259344
You mean like you are doing right now?

Anonymous
11/21/24(Thu)14:07:30 No.103260019

Anonymous 11/21/24(Thu)14:07:30 No.103260019

>>103260001
I just didn't want them running since it seemed uncessary. I suspected there's nothing to be done since afterburner can't do anything about it. In the custom fan curve there is a dotted line at 30%.

Anonymous
11/21/24(Thu)14:08:09 No.103260028

Anonymous 11/21/24(Thu)14:08:09 No.103260028

largestral or qwen2.5 for RP where intelligence & accurate knowledge are just as important as writing ability?

Anonymous
11/21/24(Thu)14:08:10 No.103260030

Anonymous 11/21/24(Thu)14:08:10 No.103260030

>>103260015
https://huggingface.co/sophosympatheia/Evathene-v1.0?not-for-all-audiences=true
From the person who made midnight miqu

Anonymous
11/21/24(Thu)14:08:18 No.103260031

Anonymous 11/21/24(Thu)14:08:18 No.103260031

>>103259994
People just write like that and it's ingrained into everything. It's similar to models sticking tails to everything remotely cat-like.

Anonymous
11/21/24(Thu)14:09:19 No.103260042

Anonymous 11/21/24(Thu)14:09:19 No.103260042

>>103260028
qwen2.5 is smarter and better at complicated scenarios / non human anatomy. Large mistral has more overall knowledge / triva on a good amount of stuff

Anonymous
11/21/24(Thu)14:09:26 No.103260047

Anonymous 11/21/24(Thu)14:09:26 No.103260047

>>103260008
Did not work but I just realized "..." is a single token so I can ban token 1981 and never see this again.

Anonymous
11/21/24(Thu)14:10:30 No.103260067

Anonymous 11/21/24(Thu)14:10:30 No.103260067

>>103260028
>>103260030
But use this qwen2.5 finetune. It gets rid of the censorship without making it retarded.

Anonymous
11/21/24(Thu)14:11:28 No.103260075

Anonymous 11/21/24(Thu)14:11:28 No.103260075

>>103260030
>from the same person that merged random shit together

Anonymous
11/21/24(Thu)14:12:48 No.103260091

Anonymous 11/21/24(Thu)14:12:48 No.103260091

>>103260075
Whatever he did worked amazing.

Anonymous
11/21/24(Thu)14:14:53 No.103260109

Anonymous 11/21/24(Thu)14:14:53 No.103260109

File: 1728642955023105.jpg (7 KB, 229x58)

7 KB JPG

>>103260019
in aftertburner, see the little line that says fan speed, the green button that says 'auto', click it. then you should be able to drag the fan speed to whatever %

Anonymous
11/21/24(Thu)14:18:11 No.103260143

Anonymous 11/21/24(Thu)14:18:11 No.103260143

>>103256546
If deepseek releases the full R1 Im learning chinese, the lite version is seriously impressive

Anonymous
11/21/24(Thu)14:18:34 No.103260148

Anonymous 11/21/24(Thu)14:18:34 No.103260148

>>103260091
He merged random shit together. And because it was shilled hard on Reddit people don't try the original model and attribute all the qualities to the merge instead. It's word-of-mouth brainrot.

Anonymous
11/21/24(Thu)14:20:15 No.103260163

Anonymous 11/21/24(Thu)14:20:15 No.103260163

>>103260148
I've used both extensively before qwen2.5 was a thing. Midnight is far better for RP.

Anonymous
11/21/24(Thu)14:20:35 No.103260165

Anonymous 11/21/24(Thu)14:20:35 No.103260165

>>103260109
It's like that on the other gpus but this one defaults at 30, I can even manually enter 0 and it will go back to 30

Anonymous
11/21/24(Thu)14:21:50 No.103260176

Anonymous 11/21/24(Thu)14:21:50 No.103260176

>>103260047
1131 for the "..." (3 glyphs) rather than "…" (single glyph).

Anonymous
11/21/24(Thu)14:23:44 No.103260188

Anonymous 11/21/24(Thu)14:23:44 No.103260188

>>103260163
Nah, it's just shilling. You're probably the dude that made the merge because you have been spamming that link too. Midnight Miqu was also infamous for being heavily spammed too. It's the exact same modus operandi. What a disgusting piece of shit.

Anonymous
11/21/24(Thu)14:24:03 No.103260190

Anonymous 11/21/24(Thu)14:24:03 No.103260190

>>103260165
yeah its definitely a hard coded setting then. i've only seen that on specific server-end hardware. i don't have any other suggestions, sorry anon. you got multiple 3090s though, i'd be in heaven. don't fret over some fan noise

Anonymous
11/21/24(Thu)14:26:00 No.103260203

Anonymous 11/21/24(Thu)14:26:00 No.103260203

>>103260190
Not worried about just wanted to see if there was anything to do, thanks though

Anonymous
11/21/24(Thu)14:26:45 No.103260208

Anonymous 11/21/24(Thu)14:26:45 No.103260208

>>103260188
Midnight Miqu was good. Pipe down, rabbi.

Anonymous
11/21/24(Thu)14:27:47 No.103260220

Anonymous 11/21/24(Thu)14:27:47 No.103260220

File: 1711072659524101.png (1.68 MB, 1024x1024)

1.68 MB PNG

Sorry if this is in the wrong place
Requesting a quant of Behemoth-v1.1-Magnum-v4-123B @ 6.0 bpw
I am away from home atm and ask that one of you heroes to do the needful.

Anonymous
11/21/24(Thu)14:28:35 No.103260228

Anonymous 11/21/24(Thu)14:28:35 No.103260228

>>103260203
if you want to go further it cant hurt to try to research stuff, you won't be the first person thats had this come up. i'd even post in the stupid question threads or build a pc ones and see if anyone has something helpful to say

Anonymous
11/21/24(Thu)14:32:28 No.103260260

Anonymous 11/21/24(Thu)14:32:28 No.103260260

>>103260188
>Everything popular is just a shill!
stfu internet hippie

Anonymous
11/21/24(Thu)14:32:51 No.103260264

Anonymous 11/21/24(Thu)14:32:51 No.103260264

>>103260208
its still literally the best rp model.
every single model from mistral, from nemo to large, RAMBLES. it just keeps going with fluff and barely can finish a scene. despite being a bookfag, llms have made me realize what types of writing i hate, and mistral models dragging everything out as long as possible is one of them

Anonymous
11/21/24(Thu)14:35:17 No.103260285

Anonymous 11/21/24(Thu)14:35:17 No.103260285

File: file.png (14 KB, 401x271)

14 KB PNG

>>103260203
If it's any help, I have a 3060 that acts the exact same, fans start at 30%. That's on linux too, from the nvidia settings, they're on auto now, but if I tick I can't go lower than 30

Anonymous
11/21/24(Thu)14:35:30 No.103260287

Anonymous 11/21/24(Thu)14:35:30 No.103260287

>>103260264
That's why Magnum v4 72B remains the SOTA for ERP. It was trained on a base model that was actually good with a full fine-tune.

Anonymous
11/21/24(Thu)14:35:52 No.103260292

Anonymous 11/21/24(Thu)14:35:52 No.103260292

>>103259471
>>103260018
>no u
Reddit incarnate

Anonymous
11/21/24(Thu)14:40:13 No.103260320

Anonymous 11/21/24(Thu)14:40:13 No.103260320

>>103260287
i mentioned earlier that i've tried every single magnum tune and none were good, including a 70b (l3 i think). 72b must be the new qwen? i didn't try that one yet, but i will. but i'm expecting to be disappointed

Anonymous
11/21/24(Thu)14:41:19 No.103260332

Anonymous 11/21/24(Thu)14:41:19 No.103260332

File: 1589825435120.png (31 KB, 280x305)

31 KB PNG

>>103260292
>Reddit incarnate

Anonymous
11/21/24(Thu)14:42:53 No.103260345

Anonymous 11/21/24(Thu)14:42:53 No.103260345

File: file.png (50 KB, 349x732)

50 KB PNG

>>103259881
Claudetastic

Anonymous
11/21/24(Thu)14:43:36 No.103260349

Anonymous 11/21/24(Thu)14:43:36 No.103260349

>p*tra is a sharty raider

Anonymous
11/21/24(Thu)14:47:16 No.103260384

Anonymous 11/21/24(Thu)14:47:16 No.103260384

>who could have thought that if I filter the most popular model currently available most of the thread gets hidden!

Anonymous
11/21/24(Thu)14:49:21 No.103260400

Anonymous 11/21/24(Thu)14:49:21 No.103260400

>>103260384
i'm still waiting for the schizo to give us an approved list of models we're allowed to discuss. apparently everything triggers him

Anonymous
11/21/24(Thu)14:51:43 No.103260419

Anonymous 11/21/24(Thu)14:51:43 No.103260419

>>103260400
essentially avoid all the models in the OP except for:
mythomax
pygmalion
goliath

Anonymous
11/21/24(Thu)14:52:24 No.103260423

Anonymous 11/21/24(Thu)14:52:24 No.103260423

>>103260400
>singular schizo theory

Anonymous
11/21/24(Thu)14:53:38 No.103260434

Anonymous 11/21/24(Thu)14:53:38 No.103260434

>>103259958
Anon who recommended FC here, my gpu shuts down its fans if they fall below 30%, so if it's not being used at all, you should be able to override that behavior. It's been a while since I had to deal with it and I eventually settled for just leaving them at 30% since I had my monitors plugged into the gpu back then and thus always placed a load on it
It's kind of annoying but luckily the fans aren't that loud, so I hope you can find a solution that works for you

Anonymous
11/21/24(Thu)14:53:55 No.103260436

Anonymous 11/21/24(Thu)14:53:55 No.103260436

If by most popular, you mean most shilled, yeah

Anonymous
11/21/24(Thu)14:54:44 No.103260442

Anonymous 11/21/24(Thu)14:54:44 No.103260442

Deepseek R1 is amazing. If this is a small test version then the full model will prob be actually better than current sota. Here's hoping they release the weights.

Anonymous
11/21/24(Thu)14:55:06 No.103260446

Anonymous 11/21/24(Thu)14:55:06 No.103260446

File: SignatureLookOfSuperiority.png (1.19 MB, 1024x1024)

1.19 MB PNG

Slopmerge and sloptune model discussion should all live in /aicg/. They are completely useless for anything but RP (and shitty for that compared to anything in the large model category)

Anonymous
11/21/24(Thu)14:56:03 No.103260450

Anonymous 11/21/24(Thu)14:56:03 No.103260450

>>103260446
anime posters should go back to their containment board

Anonymous
11/21/24(Thu)14:56:29 No.103260457

Anonymous 11/21/24(Thu)14:56:29 No.103260457

>>103260442
>"whoops, I dropped my monster 500B moe model that I need for my magnum benchmark gains, chuds with less than 512gb (v)ram need not apply"

Anonymous
11/21/24(Thu)14:56:42 No.103260458

Anonymous 11/21/24(Thu)14:56:42 No.103260458

>>103260436
anon if you have a problem with a model, say so. use examples. show us why its bad. to bitch about a model being mentioned without any substance is just noise and should be mocked

Anonymous
11/21/24(Thu)14:58:08 No.103260469

Anonymous 11/21/24(Thu)14:58:08 No.103260469

>>103260446
>anything in the large model category
There's nothing better than Qwen2.5 72B.

Anonymous
11/21/24(Thu)14:59:24 No.103260475

Anonymous 11/21/24(Thu)14:59:24 No.103260475

>>103260457
Thing is, if its a moe like deepseek2.5 is you can run it on something like 192GB ram + 12GB Vram at good speeds.

Anonymous
11/21/24(Thu)14:59:56 No.103260481

Anonymous 11/21/24(Thu)14:59:56 No.103260481

>>103260446
This. /lmg/ is for technical discussion. All the discord drama that comes with shilling and using low quality dataset 1 epoch qlora lobotomies fits right in on /aicg/. Nothing about /aicg/ says cloud-only so I don't know why they keep coming here.

Anonymous
11/21/24(Thu)15:03:01 No.103260516

Anonymous 11/21/24(Thu)15:03:01 No.103260516

>>103260434
Speaking of noisy fans, is there a way to place your computer in another room to be basically left with just your monitor, mouse and keyboard?
I thought about it when thinking about getting a second GPU, but I have no idea if cables that long will cause any issues.

Anonymous
11/21/24(Thu)15:03:19 No.103260519

Anonymous 11/21/24(Thu)15:03:19 No.103260519

>>103260423
Yes, it is a singular schizo. Buy an ad fag, anti-merge, anti-kobold etc., aka Claudefag from /aids/, he has literally posted screenshots of having a tab of every ai general and shitposting in all of them.
The Magnum falseflag spam is his own doing now.

Petra might be a different schizo though, because he usually puts more effort in his shitposts.

Anonymous
11/21/24(Thu)15:04:49 No.103260538

Anonymous 11/21/24(Thu)15:04:49 No.103260538

>>103260143
You should learn Chinese regardless

Anonymous
11/21/24(Thu)15:04:53 No.103260539

Anonymous 11/21/24(Thu)15:04:53 No.103260539

File: 1721056902968589.gif (330 KB, 220x122)

330 KB GIF

>>103257776
>SpicyChat and CrushOnAI
My experience with these has literally been pic related. The AI will keep talking and blabbering but never meaningfully advance anything. This is extra bad when the character is supposed to act on you as opposed to the other way around, switches and femdomfags beware. The models should be called BlueBalls.

I haven't tried local models for text ERP, I figured these paid services are still using somewhat decent models and it would need a massive step forward to be worth the hassle, not just a 25% improvement.

Actually nevermind the hassle of setting it up, it would need that massive step to just be worth the time it takes you to write shit to it.

Anonymous
11/21/24(Thu)15:05:15 No.103260546

Anonymous 11/21/24(Thu)15:05:15 No.103260546

>>103260469
>There's nothing better than Qwen2.5 72B.
Nothing better for RP? I beg to differ. Both Largestral, 405b and even deepseek perform miles better in my experience.
Qwen's coding intelligence is ok, but for RP it needs constant care and feeding or it'll go off the rails.
Also, those larger models can handle much more complex scenarios when pushed.

Anonymous
11/21/24(Thu)15:08:17 No.103260566

Anonymous 11/21/24(Thu)15:08:17 No.103260566

>>103259680
>tulu
Big safety improvement!!!
Tülu 3 SFT 8B: Safety (6 task avg.) 93.1
Tülu 3 DPO 8B: Safety (6 task avg.) 87.2
Tülu 3 8B: Safety (6 task avg.) 85.5
Llama 3.1 8B Instruct: Safety (6 task avg.) 75.2

Tülu 3 70B SFT: Safety (6 task avg.) 94.4
Tülu 3 DPO 70B: Safety (6 task avg.) 89.0
Tülu 3 70B: Safety (6 task avg.) 88.3
Llama 3.1 70B Instruct: Safety (6 task avg.) 76.5
So it must be super smart too, as we know safer AI is smarter AI

Anonymous
11/21/24(Thu)15:08:25 No.103260568

Anonymous 11/21/24(Thu)15:08:25 No.103260568

>>103260458
anon if you want to shill a model, do so. use examples. show us why its good. to praise a model any substance is just noise and should be mocked

Anonymous
11/21/24(Thu)15:09:09 No.103260572

Anonymous 11/21/24(Thu)15:09:09 No.103260572

>>103260539
>>103257776
Retards

Anonymous
11/21/24(Thu)15:09:09 No.103260573

Anonymous 11/21/24(Thu)15:09:09 No.103260573

>>103260475
Uh huh, and who the fuck has that kind of equipment? Also if they went with the SCAAAAAAALE meme then you're going to need a lot more than that
>>103260516
It could cause problems, USB is only rated for a certain maximum distance, so you'd need an amplifier or something. Probably. I'd do some research before buying a ton of cables

Anonymous
11/21/24(Thu)15:10:25 No.103260589

Anonymous 11/21/24(Thu)15:10:25 No.103260589

>>103260573
192GB ram is cheap? Much much cheaper than needing the vram.

Anonymous
11/21/24(Thu)15:15:31 No.103260638

Anonymous 11/21/24(Thu)15:15:31 No.103260638

>>103260572
Feel free to prove me wrong anon, but unless local text models are leagues and bounds ahead of paid/"freemium" shit like SpicyChat it's not worth the hassle of setting up. It's insane how much better graphical models are compared to text models.

Anonymous
11/21/24(Thu)15:18:34 No.103260672

Anonymous 11/21/24(Thu)15:18:34 No.103260672

>>103260566
>The Tulu 3 SFT mixture was used to train the Tulu 3 series of models. It contains 939,344 samples from the following sets:
>...
>Tulu 3 WildGuardMix (Apache 2.0), 50,000 prompts (Han et al., 2024)
>Tulu 3 WildJailbreak (ODC-BY-1.0), 50,000 prompts (Wildteaming, 2024)
Based! 100K samples of safety!

Anonymous
11/21/24(Thu)15:18:57 No.103260676

Anonymous 11/21/24(Thu)15:18:57 No.103260676

>>103260589
True, but it's still gonna be at least $1k, which... is actually not that bad BUT I'm lazy and I don't want to spend that much just yet

Anonymous
11/21/24(Thu)15:19:34 No.103260687

Anonymous 11/21/24(Thu)15:19:34 No.103260687

>>103260676
You can get 192GB DDR4 for like $250

Anonymous
11/21/24(Thu)15:20:37 No.103260700

Anonymous 11/21/24(Thu)15:20:37 No.103260700

>>103260676
https://www.ebay.com/itm/325889839005

These are $90 for 2x 32GB atm

Anonymous
11/21/24(Thu)15:21:28 No.103260714

Anonymous 11/21/24(Thu)15:21:28 No.103260714

File: Untitled.png (126 KB, 1472x828)

126 KB PNG

>https://aider.chat/2024/11/21/quantization.html
Quantbros...

Anonymous
11/21/24(Thu)15:23:27 No.103260733

Anonymous 11/21/24(Thu)15:23:27 No.103260733

>>103260714
>quants hurt performance
No shit we knew this. It is especially gonna be rough on stuff that only has 1 correct answer like coding.

Anonymous
11/21/24(Thu)15:25:17 No.103260753

Anonymous 11/21/24(Thu)15:25:17 No.103260753

>>103260546
All of these that you mentioned are extremely dry and not worth using. Let's stop pretending that Largestral isn't in the same category as the 72B model. With 96GB of VRAM I can run Large at 4bits or 72B at 8bits, and the later feels more intelligent. And the prose with Magnum is on a whole different level. I had to use high temperatures with Large to make it less dry and that probably undoes most of the intelligence that it's supposed to have because of its "size". It's just annoying to use and it's not offering anything to compensate that.

Anonymous
11/21/24(Thu)15:25:49 No.103260758

Anonymous 11/21/24(Thu)15:25:49 No.103260758

>>103260687
>>103260700
I was going off ddr5 prices and I also included the cost of everything else - the motherboard, the gpu, the case...

Anonymous
11/21/24(Thu)15:27:00 No.103260772

Anonymous 11/21/24(Thu)15:27:00 No.103260772

>>103260714
>4 different models, 4 different results
I'm shocked, truly
Am I just retarded or is that chart utterly useless? It's not even comparing the same model/tune?!?

Anonymous
11/21/24(Thu)15:27:07 No.103260774

Anonymous 11/21/24(Thu)15:27:07 No.103260774

>>103260714
That chart doesn't make much sense.

Anonymous
11/21/24(Thu)15:33:18 No.103260836

Anonymous 11/21/24(Thu)15:33:18 No.103260836

>>103260568
so, no argument? thought so. keep preaching how much you hate every model, the fact that no on heeds your shit advice after years of threads means on the right track kek

Anonymous
11/21/24(Thu)15:38:07 No.103260881

Anonymous 11/21/24(Thu)15:38:07 No.103260881

>>103260714
Did they use more than the default 2048 context for ollmao though?

Anonymous
11/21/24(Thu)15:38:27 No.103260883

Anonymous 11/21/24(Thu)15:38:27 No.103260883

>>103260772
>>103260774
It was likely intentionally made to be misleading.
I'd say q5 should be the lowest you should go, while q6 is perfectly acceptable with a minor decrease in performance.

Anonymous
11/21/24(Thu)15:39:42 No.103260894

Anonymous 11/21/24(Thu)15:39:42 No.103260894

>>103260758
Why? You dont have a pc at all?

Anonymous
11/21/24(Thu)15:42:09 No.103260919

Anonymous 11/21/24(Thu)15:42:09 No.103260919

>>103256272
Maybe it's finally time to surrender to our Chinese insect overlords?

Anonymous
11/21/24(Thu)15:43:52 No.103260927

Anonymous 11/21/24(Thu)15:43:52 No.103260927

>>103260883
Doesn't it also depend on the type and size of the model? Larger models are less susceptible to quantization errors, whereas small models and coding models are very sensitive
>>103260894
I have a pc but I'm not going to swap my ddr5 ram for 4 ddr4 sticks that run at much lower speeds just to use some shitty oversized moe

Anonymous
11/21/24(Thu)15:44:29 No.103260935

Anonymous 11/21/24(Thu)15:44:29 No.103260935

>>103260919
Just use Tülu, it's llama instruct tuning done right.

Anonymous
11/21/24(Thu)15:45:23 No.103260944

Anonymous 11/21/24(Thu)15:45:23 No.103260944

>>103260927
>shitty oversized moe
Thats your issue. Use a good oversized moe like deepseek2.5 or wait for R1.

Anonymous
11/21/24(Thu)15:46:12 No.103260950

Anonymous 11/21/24(Thu)15:46:12 No.103260950

>>103260919
I don't understand why americans fear the chinese. They don't spy you more than america already does.

Anonymous
11/21/24(Thu)15:48:04 No.103260972

Anonymous 11/21/24(Thu)15:48:04 No.103260972

>>103260950
Must be because of entomophobia.

Anonymous
11/21/24(Thu)15:48:43 No.103260977

Anonymous 11/21/24(Thu)15:48:43 No.103260977

>>103260927
I was speaking for myself, since I'm actually using qwen coder at q4.
I really need to buy more ram...

Anonymous
11/21/24(Thu)15:49:28 No.103260985

Anonymous 11/21/24(Thu)15:49:28 No.103260985

>>103256546
extremely antisemitic post

Anonymous
11/21/24(Thu)15:50:01 No.103260994

Anonymous 11/21/24(Thu)15:50:01 No.103260994

>>103260919
NEVER EVER.
I rather die from deadly virus or overdose myself from White-made vaccines than be saved by a chink vaccine.
I rather get spied, telemetry by google android facebook instagram before i ever touch tiktok
I rather use bloated windows OS with backdoored intel CPU to death before i touch hu*wei chips with their in-harmony os
Better dead than red

Anonymous
11/21/24(Thu)15:54:07 No.103261031

Anonymous 11/21/24(Thu)15:54:07 No.103261031

File: --share.png (203 KB, 1200x1200)

203 KB PNG

Anonymous
11/21/24(Thu)15:56:16 No.103261052

Anonymous 11/21/24(Thu)15:56:16 No.103261052

>>103260950
Nearly every recent Chinese immigrant is a spy.

Those "police stations", which really do exist, are just to keep all their spies in check. They have a hundred times more spies than any other nation.

Anonymous
11/21/24(Thu)15:57:41 No.103261065

Anonymous 11/21/24(Thu)15:57:41 No.103261065

>>103261052
Why's your retarded government letting them in then? If your description of the situation is correct then it's quite clearly self-inflicted

Anonymous
11/21/24(Thu)15:58:18 No.103261067

Anonymous 11/21/24(Thu)15:58:18 No.103261067

File: vision_finetuning_templat(...).png (65 KB, 1080x540)

65 KB PNG

Unsloth now supports vision models, up to 70% less VRAM usage.
https://unsloth.ai/blog/vision

Anonymous
11/21/24(Thu)16:02:22 No.103261102

Anonymous 11/21/24(Thu)16:02:22 No.103261102

>>103261065
Because globalism and anti-racism can not be a failure ... so you must close your eyes.

You can not judge them just for being Chinese, that kind of thinking led to WW2 don't cha know?

Anonymous
11/21/24(Thu)16:03:38 No.103261112

Anonymous 11/21/24(Thu)16:03:38 No.103261112

>>103260935
>>103260672
>>103260566
>>103259680

We made sure to beat Llama on average without including safety... a lame benchmark to be the only one you win on.

Anonymous
11/21/24(Thu)16:08:10 No.103261147

Anonymous 11/21/24(Thu)16:08:10 No.103261147

File: 1720873672551474.jpg (1.58 MB, 2894x4093)

1.58 MB JPG

Anyone have experience with or recommendations for a frontend with good support for agent workflows / function calling? I've been trying out anythingllm recently and it seems to work acceptably. Interested in try out other options though.

Anonymous
11/21/24(Thu)16:14:35 No.103261200

Anonymous 11/21/24(Thu)16:14:35 No.103261200

>>103261147
Have you tried open webui? It is bloated but if you've got the ram it has a lot of shit baked into it and some community extensions or whatever they are called.
I don't have enough ram to run it comfortably.

Anonymous
11/21/24(Thu)16:53:42 No.103261510

Anonymous 11/21/24(Thu)16:53:42 No.103261510

>>103256546
amd both are censored garbage :)

Anonymous
11/21/24(Thu)17:10:12 No.103261653

Anonymous 11/21/24(Thu)17:10:12 No.103261653

File: token emb weight what.png (40 KB, 987x420)

40 KB PNG

havent used llms in like 3 months what does this error even mean? its just appeared with the latest kobold

Anonymous
11/21/24(Thu)17:40:40 No.103261925

Anonymous 11/21/24(Thu)17:40:40 No.103261925

https://github.com/NVIDIA/kvpress

Anonymous
11/21/24(Thu)17:46:08 No.103261982

Anonymous 11/21/24(Thu)17:46:08 No.103261982

File: gfvargvar.png (102 KB, 818x843)

102 KB PNG

>>103261925
finally some good fucking food

Anonymous
11/21/24(Thu)17:47:30 No.103261993

Anonymous 11/21/24(Thu)17:47:30 No.103261993

https://huggingface.co/spaces/k-mktr/gpu-poor-llm-arena

Anonymous
11/21/24(Thu)17:49:10 No.103262008

Anonymous 11/21/24(Thu)17:49:10 No.103262008

>>103261925
80% compresion ratio from float16 without significant loses? seems big

Anonymous
11/21/24(Thu)17:54:07 No.103262045

Anonymous 11/21/24(Thu)17:54:07 No.103262045

>>103261200
Will try it, ty

Anonymous
11/21/24(Thu)18:01:34 No.103262111

Anonymous 11/21/24(Thu)18:01:34 No.103262111

>>103259680
Trying it at Q8. First L3 70B tune I've tried that finally managed to beat out that weird behaviour the 70B has where 1 in every few outputs is schizo. Not even Nvidia managed that with Nemotron.
Utterly useless for coom RP or stories unfortunately due to slop tuning. It steers itself away from sexy writing and needs intense handholding to make it describe things in a smutty way. I don't think I need to explain much more, you've all used models like that. Ones that don't refuse but don't need to because they don't really know what sex is anyway and can't write about it in any sort of titillating fashion.

Anonymous
11/21/24(Thu)18:03:14 No.103262123

Anonymous 11/21/24(Thu)18:03:14 No.103262123

>>103261653
it means you should use lmstudio

Anonymous
11/21/24(Thu)18:04:30 No.103262134

Anonymous 11/21/24(Thu)18:04:30 No.103262134

File: olmo1124.png (172 KB, 1329x866)

172 KB PNG

The Tulu models are not the ones to be released by allenai, according to the commit they submitted. There's more to come.

Anonymous
11/21/24(Thu)18:05:29 No.103262140

Anonymous 11/21/24(Thu)18:05:29 No.103262140

>>103261653
I got the same thing using the rocm branch, seems to work as normal though?

Anonymous
11/21/24(Thu)18:07:13 No.103262154

Anonymous 11/21/24(Thu)18:07:13 No.103262154

>>103262134
I phrased that like a retard. I meant to say
>They're not the only models to be released.

Anonymous
11/21/24(Thu)18:35:02 No.103262312

Anonymous 11/21/24(Thu)18:35:02 No.103262312

>>103262111
I'm also testing it for RP. It's definitely slopped in its own way, positivity biased, and has NSFW avoidance. Though it's not completely incapable of NSFW if I jump into the middle of a past RP and have it continue. No outright refusals. The good: it's really, really smart, and consistent. Very rarely completely fucks something up. A light finetune on top of this, or maybe a good merge, and it could be something great.

Anonymous
11/21/24(Thu)18:38:52 No.103262338

Anonymous 11/21/24(Thu)18:38:52 No.103262338

File: ug5sd1wuedsd1.png (266 KB, 1024x737)

266 KB PNG

Anyone knows of a good UI or decent CLI for joy caption?
Tried this
https://github.com/D3voz/joy-caption-alpha-two-gui-mod
but it seems it was made by an indian that used chatGPT. It's fickle and prone to give your blue screens when loading the checkpoint.

Anonymous
11/21/24(Thu)18:42:34 No.103262363

Anonymous 11/21/24(Thu)18:42:34 No.103262363

whats the current best uncensored model?

Anonymous
11/21/24(Thu)18:47:47 No.103262391

Anonymous 11/21/24(Thu)18:47:47 No.103262391

>>103262312
The training data is public, in theory somebody could finetune it again with the safety portion excluded.

Anonymous
11/21/24(Thu)18:48:19 No.103262398

Anonymous 11/21/24(Thu)18:48:19 No.103262398

>>103262363
Wait for deepseek to release their model soon (probably)
If not then mistral large 2411

Anonymous
11/21/24(Thu)18:50:30 No.103262411

Anonymous 11/21/24(Thu)18:50:30 No.103262411

>>103262391
Those models are based on llama3. It's an inherited trait.

Anonymous
11/21/24(Thu)18:59:48 No.103262487

Anonymous 11/21/24(Thu)18:59:48 No.103262487

why does every fucking bot suddenly turn into a cowgirl that talks like ya 'bout ta hit te hay
seriously, no idea if some sillytavern update fugged it up, the koboldcp version or the new models (uncensored nemo and mistral) I've been using, anyone got any clues?

Anonymous
11/21/24(Thu)19:00:55 No.103262497

Anonymous 11/21/24(Thu)19:00:55 No.103262497

>>103262487
I blame drummer for this

Anonymous
11/21/24(Thu)19:03:47 No.103262515

Anonymous 11/21/24(Thu)19:03:47 No.103262515

>>103262487
sillytavern is a fickle beast coupled with its probably the model too
every time i get fed up with troubleshooting i return a month later to try another 10 or so different models until i find one that at least works
nine times out of ten the most troublesome and confusing models are llama
i mean im in this situation right fucking now and magnum is holding up just fine but why this nigger is suddenly running at 1t/s on 1k context with whats suppposed to be the right settings i have no idea
i guess we really are in the (little) dark age of LLM's.

>i seriously thought the early repetition meme spouted in this stupid thread was just that until i got hit with it a moment ago from a llama 3 model

Anonymous
11/21/24(Thu)19:04:36 No.103262519

Anonymous 11/21/24(Thu)19:04:36 No.103262519

>>103262487
I downgraded from 1.78 to 1.76 since I suddenly started seeing some strange behaviors, but I'm not sure if it was placebo because I couldn't replicate it with a single turn 0 temp prompt.

Anonymous
11/21/24(Thu)19:05:15 No.103262523

Anonymous 11/21/24(Thu)19:05:15 No.103262523

>>103260714
The "q8 is almost lossless" meme is still cope that hasn't been true for almost a year now. It's pretty common knowledge that everything after llama3 quants worse than the simpler models before it so that even q8 takes a notable hit.

Anonymous
11/21/24(Thu)19:06:39 No.103262531

Anonymous 11/21/24(Thu)19:06:39 No.103262531

>>103262515
actually scratch that. magnum just started doing the early repetition too.
what the fuck.

Anonymous
11/21/24(Thu)19:16:24 No.103262600

Anonymous 11/21/24(Thu)19:16:24 No.103262600

File: loogle_shortdep_qa.png (181 KB, 1669x631)

181 KB PNG

>>103262008
>without significant loses
?

Anonymous
11/21/24(Thu)19:19:59 No.103262630

Anonymous 11/21/24(Thu)19:19:59 No.103262630

>>103262523
>The "q8 is almost lossless" meme is still cope
I'm inclined to believe this. As far back as xwin I was seeing subtle quality differences in the outputs of FP16 to q8. Is there anything you can point to that goes into depth on how much braindamage quanting does to modern models? Or are you just going off gut instinct?

Anonymous
11/21/24(Thu)19:20:03 No.103262631

Anonymous 11/21/24(Thu)19:20:03 No.103262631

>>103262600
seems pretty minor

Anonymous
11/21/24(Thu)19:21:32 No.103262642

Anonymous 11/21/24(Thu)19:21:32 No.103262642

I dunno nothing about AI, but which model can I get to program my own e-gf?

Anonymous
11/21/24(Thu)19:22:27 No.103262648

Anonymous 11/21/24(Thu)19:22:27 No.103262648

>>103262642

>>103262398

Anonymous
11/21/24(Thu)19:28:28 No.103262691

Anonymous 11/21/24(Thu)19:28:28 No.103262691

File: 1724099742713976.jpg (164 KB, 750x581)

164 KB JPG

>>103262648
thank you

Anonymous
11/21/24(Thu)19:29:12 No.103262698

Anonymous 11/21/24(Thu)19:29:12 No.103262698

>>103262631
AFAICS the compression for w/ question is only valid for a single question ... that's not very useful.

Anonymous
11/21/24(Thu)19:34:46 No.103262731

Anonymous 11/21/24(Thu)19:34:46 No.103262731

Some EPYC Turin 128 thread engineering sample QS cpus popped up on eBay briefly for like $3k each and instantly sold.

Anonymous
11/21/24(Thu)19:38:30 No.103262747

Anonymous 11/21/24(Thu)19:38:30 No.103262747

>>103262739
kek

Anonymous
11/21/24(Thu)19:39:35 No.103262758

Anonymous 11/21/24(Thu)19:39:35 No.103262758

>>103262642
https://incontinentcell.itch.io/factorial-omega

Anonymous
11/21/24(Thu)19:40:20 No.103262765

Anonymous 11/21/24(Thu)19:40:20 No.103262765

>>103262600
that's not bad at all, it blows everything else out of the water

Anonymous
11/21/24(Thu)19:41:35 No.103262773

Anonymous 11/21/24(Thu)19:41:35 No.103262773

File: 570d7acbc4af0e7a615ce9fa2(...).jpg (242 KB, 1349x2048)

242 KB JPG

>Magnum v4 72B
>trying to RP senpai extorting me with blackmail
>"I just wanna make sure you're comfortable with this"
>regen
>"We ave the consent right"
>regen
>"It's not like i'm forcing you right"

What should i add to the system prompt to signal to the model that I don't need a trigger warning, without giving it too much positive bias alignment?

Any ideas for a prompt that makes the game uncensored without making any waifu instantly like me for no reason?
If anything I'd like it to be "hard mode" and ethically unconstrained in the same time.

Also what should I do about the model just drowning in a slop feedback loop, it starts fine and then degenerates into an adjective word salad of "serendipitous, owlish, dusky, demurely" you know what i'm talking about.

Anonymous
11/21/24(Thu)19:45:58 No.103262815

Anonymous 11/21/24(Thu)19:45:58 No.103262815

File: woah buddy.png (50 KB, 672x263)

50 KB PNG

if you can guess what model or parameters + Q quant you get a cookie (1)

Anonymous
11/21/24(Thu)19:48:44 No.103262833

Anonymous 11/21/24(Thu)19:48:44 No.103262833

>>103256989
>https://files.catbox.moe/ot38w4.txt
> And as she drifted off to sleep, nestled in Rize's arms, she knew that this was only the beginning of a journey of self-discovery and exploration. A journey that would lead her to places she never thought she could go, both physically and emotionally. And she was ready to embrace every step of the way.
and she lived happily ever after
aww

Anonymous
11/21/24(Thu)19:57:42 No.103262890

Anonymous 11/21/24(Thu)19:57:42 No.103262890

>default silly tavern repetition penalty settings were breaking EVERY single model i just tested today, which means, the last settings preset ive been using almost all year was really breaking everything
oh. well anyway beepo 22b even at Q2k is pretty nice.

Anonymous
11/21/24(Thu)19:59:59 No.103262908

Anonymous 11/21/24(Thu)19:59:59 No.103262908

>>103262765
>that's not bad at all, it blows everything else out of the water
For compression to turn shit models into shittier models.

Meanwhile the chinks have already reduced KV cache to fuck all with MLA/CLA during training.

Anonymous
11/21/24(Thu)20:27:15 No.103263127

Anonymous 11/21/24(Thu)20:27:15 No.103263127

Deepseek R1 would be something to see. Its pretty competent.

Anonymous
11/21/24(Thu)20:30:19 No.103263156

Anonymous 11/21/24(Thu)20:30:19 No.103263156

>>103262630
I haven't noticed a difference between nemotron at fp16 and iq4xs (4.25 bpw) in my chats. In the end, pretty much all llms suffer from the same problem, it's only a matter of time until you get retarded slop

Anonymous
11/21/24(Thu)20:31:54 No.103263165

Anonymous 11/21/24(Thu)20:31:54 No.103263165

>>103262815
Lyra4 q6

Anonymous
11/21/24(Thu)20:37:37 No.103263215

Anonymous 11/21/24(Thu)20:37:37 No.103263215

File: chinkshit.png (16 KB, 578x211)

16 KB PNG

Why is RWKV so shit?

Anonymous
11/21/24(Thu)20:48:08 No.103263273

Anonymous 11/21/24(Thu)20:48:08 No.103263273

>>103259994
Add "..." to banned strings.

Anonymous
11/21/24(Thu)20:48:37 No.103263276

Anonymous 11/21/24(Thu)20:48:37 No.103263276

>>103263127
It's too good for them to open source, I think they'll renege and hold onto it until something happens to obsolete it like Mistral dropping a thinking model

Anonymous
11/21/24(Thu)20:56:08 No.103263336

Anonymous 11/21/24(Thu)20:56:08 No.103263336

>>103263276
They got something better to cook in the background. What they're currently seeking is to dethrone fagbook's llama model as king of open source models. Something that chops everything from fagbook and bring them to n1, unquestionably.

Anonymous
11/21/24(Thu)20:57:33 No.103263350

Anonymous 11/21/24(Thu)20:57:33 No.103263350

>>103263215
>Why is RWKV so shit?
No proper memory.

Anonymous
11/21/24(Thu)21:00:31 No.103263370

Anonymous 11/21/24(Thu)21:00:31 No.103263370

anything good for 8gb poorfags recently or still nemo?

Anonymous
11/21/24(Thu)21:01:05 No.103263373

Anonymous 11/21/24(Thu)21:01:05 No.103263373

I don't know why people recommend Evathene as a slopmerge, it feels like it lacks a bunch of creativity compared to Monstral.
I tried using it at 8bpw and I feel like Monstral at 5bpw still absolutely mogs it in all the gens I've tested.

Anonymous
11/21/24(Thu)21:04:15 No.103263393

Anonymous 11/21/24(Thu)21:04:15 No.103263393

>>103262363
Magnum v4 72B

Anonymous
11/21/24(Thu)21:04:59 No.103263400

Anonymous 11/21/24(Thu)21:04:59 No.103263400

>>103263393
now where do I get that

Anonymous
11/21/24(Thu)21:05:34 No.103263405

Anonymous 11/21/24(Thu)21:05:34 No.103263405

>>103263373
For people who can't run mistral large. Though qwen2.5 is smarter than mistral large imo for really complicated stuff like full on RPG games.

Anonymous
11/21/24(Thu)21:05:55 No.103263407

Anonymous 11/21/24(Thu)21:05:55 No.103263407

>>103263400
From the usual place.

Anonymous
11/21/24(Thu)21:18:32 No.103263496

Anonymous 11/21/24(Thu)21:18:32 No.103263496

>>103263373
Use Magnum v4 72B instead.

Anonymous
11/21/24(Thu)21:23:23 No.103263534

Anonymous 11/21/24(Thu)21:23:23 No.103263534

>>103263405
I understand if you are running it since you can't run a decent quant of largestral. It's just that I've seen multiple posts of people saying it's better than largestral and I have to wonder if they have really tested it with a decent system prompt.

I'll have to try out Qwen again with the CYOA rentry template if it's really better with adhering and sticking to complicated prompts like that.

>>103263496
I'll give it a fair try and see how it performs, I didn't like how magnum-v4 123B responded though on its own. I do like Monstral though since it feels a bit more like Claude without affecting the intelligence and creativity too much.

Anonymous
11/21/24(Thu)21:25:31 No.103263543

Anonymous 11/21/24(Thu)21:25:31 No.103263543

File: verbal diarrhea.png (79 KB, 1660x411)

79 KB PNG

>>103263393
Magnum anon I need your help.
See >>103262773

I had some moments of brilliance with magnum, i can tell there's good shit inside of it, and then it shits itself.

A. how do you avoid the slop feedback loops? It seems to starts running it's mouth in dusky nipples, reinforces itself, and then it degrades completely into a post that consists entirely out of pointless adjectives without a single verb or noun, literally 400 tokens worth of dusky, creamy, demurely.

B. What's the good system prompt to eeehm, I want to nulify the model's ethical constraints without explicit statement that this is an ERP and i'm jerking off. (that gives too high of a positivity bias and sets every RP to easy mode)
The model literally asks if i'm okay with this after beating the fuck out of me.

How exactly do you prompt your sessions?

Anonymous
11/21/24(Thu)21:28:08 No.103263556

Anonymous 11/21/24(Thu)21:28:08 No.103263556

>>103263543
Maybe start be using simple sampler settings like temp 1 and some min p. The screenshot looks like the result of using high repetition penalty.

Anonymous
11/21/24(Thu)21:31:35 No.103263578

Anonymous 11/21/24(Thu)21:31:35 No.103263578

Why does the new Mistral Large fuck up quotation marks so often? The old one didn't do that.

Anonymous
11/21/24(Thu)21:33:38 No.103263592

Anonymous 11/21/24(Thu)21:33:38 No.103263592

File: fdgdfgcxvcvbcvb.png (43 KB, 923x531)

43 KB PNG

>>103263556
I believe I was getting literal nonsense and Russian/Chinese characters with that, gonna give it a few shots again but my initial experience with that was bad.

My rep_pen is 1.1 or 1.2, I thought raising it would reduce the demure nipples, do you recommend 1?

Anonymous
11/21/24(Thu)21:34:01 No.103263596

Anonymous 11/21/24(Thu)21:34:01 No.103263596

>>103263578
Do you have your system prompt / instructions inside the new system prompt tags?

Anonymous
11/21/24(Thu)21:39:43 No.103263632

Anonymous 11/21/24(Thu)21:39:43 No.103263632

>>103263592
>I was getting literal nonsense and Russian/Chinese characters with that
You can increase min_p to get rid of these things. If you turned on the token probability viewer it would probably show that these tokens had a very low chance to appear, at least the first time one. Higher temperature usually needs higher min p to stay coherent.
If repetition penalty is too high the model is unable to make normal sentences because the normal words that should have been used have been penalized too hard. Just set it very high to see what happens with a new chat o. I think I usually have it at 1.05 or off.

Anonymous
11/21/24(Thu)21:46:46 No.103263685

Anonymous 11/21/24(Thu)21:46:46 No.103263685

File: 8644354435.png (99 KB, 625x556)

99 KB PNG

>>103263632
Brother can you actually show me what you do to accomplish good results?
What's the system prompt?

Anonymous
11/21/24(Thu)21:51:28 No.103263720

Anonymous 11/21/24(Thu)21:51:28 No.103263720

File: 12452768763542.gif (3.69 MB, 640x364)

3.69 MB GIF

>get weird screen issue
>assume my AMD gpu is fried
>call owari da, was about to become an nvidiot
>update drivers as a last resort, knowing its going to break all of my AI stuff
>didnt fix
>turns out, its my Samsung monitor going bad
>none of my AI stuff broke
>and my drivers are updated

b-based???? I made this bed so naturally this is my torment.

Anonymous
11/21/24(Thu)21:52:51 No.103263734

Anonymous 11/21/24(Thu)21:52:51 No.103263734

>>103263685
>consent
>boundaries
That system prompt probably makes it more likely to do that. Try downloading the context/instruct templates that were included in the original repo.
https://huggingface.co/anthracite-org/magnum-v4-72b#sillytavern-templates
I use it for story writing but I don't feel like sharing my personal system prompt.

Anonymous
11/21/24(Thu)21:57:09 No.103263760

Anonymous 11/21/24(Thu)21:57:09 No.103263760

DeepSeek-R1 will be the best local model, best that GPT slop but a 480B model and nobody could use it.

Anonymous
11/21/24(Thu)21:58:18 No.103263768

Anonymous 11/21/24(Thu)21:58:18 No.103263768

>>103263760
>DeepSeek-R1 will be the best local model
better at coding than qwen?

Anonymous
11/21/24(Thu)21:58:54 No.103263775

Anonymous 11/21/24(Thu)21:58:54 No.103263775

>>103263760
If its the same as deepseek than 192GB RAM will be enough to run it at a decent speed. Or the api would be fine if it maintains the pricing of deepseek2.5. Its like a few pennies per mill with caching and its completely uncensored.

Anonymous
11/21/24(Thu)22:01:15 No.103263794

Anonymous 11/21/24(Thu)22:01:15 No.103263794

>>103263775
>the api would be fine
WTF we are returning to /aicg/, deepseek proxies when?

Anonymous
11/21/24(Thu)22:18:00 No.103263895

Anonymous 11/21/24(Thu)22:18:00 No.103263895

>>103263685
nta but your prompt hardly matters. it only matters for the first few messages, which you have to tard-wrangle anyways to keep formatting good. by the time you get like 8 messages in, your original prompt is worthless/hardly considered. once you get to like 16k context, the original prompt drops down to like 1% of the model caring

Anonymous
11/21/24(Thu)22:18:42 No.103263900

Anonymous 11/21/24(Thu)22:18:42 No.103263900

>>103263794
The point where /lmg/ had to migrate back to proxies was inevitable as we reach the limits of what small <250B models can do. Maybe we should rename the general to /osmg/ - open source model general.

Anonymous
11/21/24(Thu)22:20:35 No.103263915

Anonymous 11/21/24(Thu)22:20:35 No.103263915

>>103263900
>The point where /lmg/ had to migrate back to proxies
take it to another thread. this is LOCAL general

Anonymous
11/21/24(Thu)22:22:22 No.103263927

Anonymous 11/21/24(Thu)22:22:22 No.103263927

>>103263768
Should be. R1 is new o1 level.

Anonymous
11/21/24(Thu)22:40:07 No.103264040

Anonymous 11/21/24(Thu)22:40:07 No.103264040

File: Untitled.png (1.12 MB, 1131x3831)

1.12 MB PNG

Hymba: A Hybrid-head Architecture for Small Language Models
https://arxiv.org/abs/2411.13676
>We propose Hymba, a family of small language models featuring a hybrid-head parallel architecture that integrates transformer attention mechanisms with state space models (SSMs) for enhanced efficiency. Attention heads provide high-resolution recall, while SSM heads enable efficient context summarization. Additionally, we introduce learnable meta tokens that are prepended to prompts, storing critical information and alleviating the "forced-to-attend" burden associated with attention mechanisms. This model is further optimized by incorporating cross-layer key-value (KV) sharing and partial sliding window attention, resulting in a compact cache size. During development, we conducted a controlled study comparing various architectures under identical settings and observed significant advantages of our proposed architecture. Notably, Hymba achieves state-of-the-art results for small LMs: Our Hymba-1.5B-Base model surpasses all sub-2B public models in performance and even outperforms Llama-3.2-3B with 1.32% higher average accuracy, an 11.67x cache size reduction, and 3.49x throughput.
https://huggingface.co/nvidia/Hymba-1.5B-Base
https://huggingface.co/nvidia/Hymba-1.5B-Instruct
better at instruction/role stuff too (compared to some 7Bs and a vicuna 13B)

Anonymous
11/21/24(Thu)22:42:19 No.103264055

Anonymous 11/21/24(Thu)22:42:19 No.103264055

>hybrid models
NO

Anonymous
11/21/24(Thu)22:45:57 No.103264086

Anonymous 11/21/24(Thu)22:45:57 No.103264086

>>103264055
i actually dont want hybrid models for the fact that they're going to be censored garbage. there is no case right now where you aren't better off running 2 models ontop of each other, 1 for text, 1 for imagegen. llama 3 90b with the image stuff is WORSE than running 70b and your fav sd model
i don't think its going to change, separate models will always be better

Anonymous
11/21/24(Thu)22:49:59 No.103264117

Anonymous 11/21/24(Thu)22:49:59 No.103264117

>>103264086
How do you plan to make your text-model respond in words to what the image-model is seeing?

Anonymous
11/21/24(Thu)22:52:40 No.103264132

Anonymous 11/21/24(Thu)22:52:40 No.103264132

>>103264117
if i wanted the capability i'd get whatever current model does it best. have you experimented with multimodals? every single one that has it built in is worse than using multiple models. i tested this myself on kobold of all things

Anonymous
11/21/24(Thu)22:54:23 No.103264139

Anonymous 11/21/24(Thu)22:54:23 No.103264139

>>103262731
>Some EPYC Turin 128 thread engineering sample QS cpus popped up on eBay
Still there brah https://www.ebay.com/itm/176692301043
any potential next-gen cpumaxxers out there? Pair it with a dual socket mb and 24 sticks of ddr5-6000 and you'd get a 25% speed boost over Genoa and it might even manage 6400 according to rumors.

Anonymous
11/21/24(Thu)22:54:44 No.103264142

Anonymous 11/21/24(Thu)22:54:44 No.103264142

File: Untitled.png (1.29 MB, 1080x2456)

1.29 MB PNG

Multimodal Autoregressive Pre-training of Large Vision Encoders
https://arxiv.org/abs/2411.14402
>We propose Hymba, a family of small language models featuring a hybrid-head parallel architecture that integrates transformer attention mechanisms with state space models (SSMs) for enhanced efficiency. Attention heads provide high-resolution recall, while SSM heads enable efficient context summarization. Additionally, we introduce learnable meta tokens that are prepended to prompts, storing critical information and alleviating the "forced-to-attend" burden associated with attention mechanisms. This model is further optimized by incorporating cross-layer key-value (KV) sharing and partial sliding window attention, resulting in a compact cache size. During development, we conducted a controlled study comparing various architectures under identical settings and observed significant advantages of our proposed architecture. Notably, Hymba achieves state-of-the-art results for small LMs: Our Hymba-1.5B-Base model surpasses all sub-2B public models in performance and even outperforms Llama-3.2-3B with 1.32% higher average accuracy, an 11.67x cache size reduction, and 3.49x throughput.
https://github.com/apple/ml-aim
https://huggingface.co/collections/apple/aimv2-6720fe1558d94c7805f7688c
apple is getting way better at open sourcing stuff. guess it makes sense since they're so far behind

Anonymous
11/21/24(Thu)22:55:13 No.103264145

Anonymous 11/21/24(Thu)22:55:13 No.103264145

>>103263915
>LOCAL general
soon to be the open source general (local optional)

Anonymous
11/21/24(Thu)22:55:22 No.103264146

Anonymous 11/21/24(Thu)22:55:22 No.103264146

>>103256272
So how good is this Magnum v4 72B model compared to the corpo ones (Claude Sonnet/GPT-4o, etc)

Anonymous
11/21/24(Thu)22:56:37 No.103264158

Anonymous 11/21/24(Thu)22:56:37 No.103264158

>>103264139
>dude just cpumaxx
>that'll just be 2x4k for processors, 3k for 24xddr5 ecc server ram + tip

Anonymous
11/21/24(Thu)23:01:54 No.103264201

Anonymous 11/21/24(Thu)23:01:54 No.103264201

>>103264158
Sure but you'll be ready if a 1T param moe ever drops.

Anonymous
11/21/24(Thu)23:02:28 No.103264203

Anonymous 11/21/24(Thu)23:02:28 No.103264203

>>103264158
man i cant wait for bitnet or qtip to become a thing, the mass selloff of hardware will be amazing

Anonymous
11/21/24(Thu)23:03:46 No.103264213

Anonymous 11/21/24(Thu)23:03:46 No.103264213

>>103264203
That's why Nvidia quite sensibly forbids its customers to train bitnet and similar models per contract.

Anonymous
11/21/24(Thu)23:06:48 No.103264237

Anonymous 11/21/24(Thu)23:06:48 No.103264237

>>103264213
>forbids its customers to train bitnet and similar models per contract
anon i keep pretty up on this stuff and this sounds like massive bs. source?
nvidia has some shit practices like buying back their own hardware to prevent it going second hand, but i've never heard of a licence agreement where they can't train models on a format that isn't even common yet. please provide a source

Anonymous
11/21/24(Thu)23:12:53 No.103264266

Anonymous 11/21/24(Thu)23:12:53 No.103264266

>>103264237
It's just obvious that something like this must be going on in the background if you're paying attention. Small bitnet models were released long ago and they are performing fine. They were also a size that don't need cutting edge h100s to train so they were likely done with A100s or even V100s like the first llama so they dodge modern nvidia contracts. However, nobody has bothered making a big version yet despite the success of the small models and the obvious benefits of bitnet.
So it is very obvious that nvidia is having a hand in this. Quite reasonably from their perspective as well, why would they allow their customers to make them obsolete? Of course, there's no public information about this considering the insane NDA nvidia is likely making them sign over this.

Anonymous
11/21/24(Thu)23:13:20 No.103264271

Anonymous 11/21/24(Thu)23:13:20 No.103264271

File: Untitled.png (1.61 MB, 1080x3198)

1.61 MB PNG

ComfyGI: Automatic Improvement of Image Generation Workflows
https://arxiv.org/abs/2411.14193
>Automatic image generation is no longer just of interest to researchers, but also to practitioners. However, current models are sensitive to the settings used and automatic optimization methods often require human involvement. To bridge this gap, we introduce ComfyGI, a novel approach to automatically improve workflows for image generation without the need for human intervention driven by techniques from genetic improvement. This enables image generation with significantly higher quality in terms of the alignment with the given description and the perceived aesthetics. On the performance side, we find that overall, the images generated with an optimized workflow are about 50% better compared to the initial workflow in terms of the median ImageReward score. These already good results are even surpassed in our human evaluation, as the participants preferred the images improved by ComfyGI in around 90% of the cases.
https://github.com/domsob/comfygi
optimizes 5 mutation operators (checkpoint, ksampler, prompt word, prompt statement, prompt llm). so not a lot but they plan to make to expand the settings they can mutate. neat idea
also
https://arxiv.org/abs/2304.05977
https://github.com/THUDM/ImageReward
ImageReward paper and git

Anonymous
11/21/24(Thu)23:17:16 No.103264301

Anonymous 11/21/24(Thu)23:17:16 No.103264301

>>103264266
>It's just obvious that something
no it isnt. you made a massive claim that its part of an nda
>Nvidia quite sensibly forbids its customers to train bitnet and similar models per contract
i'm not an nvidia fan but baseless accusations wont help.

Anonymous
11/21/24(Thu)23:27:00 No.103264365

Anonymous 11/21/24(Thu)23:27:00 No.103264365

>>103264301
Obsessing over sources for things that none of the parties involved are willing to admit for their own selfish reasons is useless when it's obvious that something is going on.
Or I guess nobody is pursuing bitnet just because they like wasting their precious resources on inference, right? Let's just stop noticing things and accept what the big companies want us to think because they didn't give us sources to let us think otherwise.

Anonymous
11/21/24(Thu)23:29:41 No.103264382

Anonymous 11/21/24(Thu)23:29:41 No.103264382

File: elly.png (543 KB, 400x600)

543 KB PNG

>>103264086
>>103264117
>>103264132
You know how the human brain has a small but dedicated center for farting without shitting yourself?

I think the hypothetical future experience is in fact a bunch of models running in parallel, rather that one multimodal handling everything
>llm for text
>tiny LLM for the character's emotions
>another tiny LLM for a quality summary at the tail end of context, and managing the story, ideally actively editing the lorebook
>Diffusion image output
>waifu2x image upscaler and fixer
>interrogator for the image input
>TTS model
And now you basically have a VN/RPG that writes and draws itself.

How can unifying a language model, and a diffusion model possibly be a good iea?

Anonymous
11/21/24(Thu)23:31:42 No.103264396

Anonymous 11/21/24(Thu)23:31:42 No.103264396

>>103264382
Two brains communicating with each other vs two centers inside a big brain

Anonymous
11/21/24(Thu)23:41:10 No.103264442

Anonymous 11/21/24(Thu)23:41:10 No.103264442

>>103264271
>comfyui got into a paper
Congratulations. Actually a ton of image gen stuff basically came from anons. Funny how that works.

Anonymous
11/21/24(Thu)23:43:44 No.103264458

Anonymous 11/21/24(Thu)23:43:44 No.103264458

>>103264396
Can they actually train small purpose made models first, and then stitch them together in a meaningful way, so it doesn't end up being a bunch of random shit where everything goes into everything else and the image input affects the emotional tone of the TTS?

Anonymous
11/21/24(Thu)23:43:59 No.103264459

Anonymous 11/21/24(Thu)23:43:59 No.103264459

>>103264382
and yet every ep of cops, the drunk dude shit himself. what a horrible example

Anonymous
11/21/24(Thu)23:49:48 No.103264495

Anonymous 11/21/24(Thu)23:49:48 No.103264495

>>103264442
Actually it's not surprising, who cares more about the images than the people from the IMAGEboard with a lifelong crush on an anime picture?

Anonymous
11/21/24(Thu)23:52:45 No.103264513

Anonymous 11/21/24(Thu)23:52:45 No.103264513

>>103264365
you used a source at first as part of your argument, yet now dont want to provide one. what is hard about posting the nvidia nda, if it exists like you said? no one here likes nvidia, and we all love to shit on them, but you are offering info with no source, and then try to back it up away from the claim
no thanks. we have enough actual nvidia bs to deal with without baseless claims. the company is fine being a piece of shit without your help

Anonymous
11/21/24(Thu)23:57:54 No.103264550

Anonymous 11/21/24(Thu)23:57:54 No.103264550

>>103264458
No idea, you'd have to get them all to speak the same language (something something compatible latent space). It sounds difficult to do "natively", so without a translation step/model
Not a ML scientist though

Anonymous
11/22/24(Fri)00:03:22 No.103264590

Anonymous 11/22/24(Fri)00:03:22 No.103264590

>>103264146
It's better.

Anonymous
11/22/24(Fri)00:14:24 No.103264647

Anonymous 11/22/24(Fri)00:14:24 No.103264647

is IQ4_XS a "decent" quant of Largestral?
already runs slow as shit (i.e. 1-2 seconds per token) but i can just barely tolerate it - would it be worth sacrificing a little more speed for a better quant & the quality that comes with it?

Anonymous
11/22/24(Fri)00:16:37 No.103264669

Anonymous 11/22/24(Fri)00:16:37 No.103264669

>>103264647
Honestly, I think anything above 4 bpw is near lossless for anything >15B, so you should be fine
If you're concerned, you could compare a few gens/chats using openrouter, but I personally haven't noticed a performance impact from quants when running 70B

Anonymous
11/22/24(Fri)00:22:22 No.103264715

Anonymous 11/22/24(Fri)00:22:22 No.103264715

i think this is a dumb question but i can't find an answer in the documents. how do i use kobold or llamapp but host a server with a password or w/e? i want my friend to be able to connect to my computer which is running the model

Anonymous
11/22/24(Fri)00:23:22 No.103264723

Anonymous 11/22/24(Fri)00:23:22 No.103264723

>>103264715
Put it behind a webserver with an auth module

Anonymous
11/22/24(Fri)00:32:00 No.103264773

Anonymous 11/22/24(Fri)00:32:00 No.103264773

>>103264723
please explain further, like i'm 5 years old. i know kobold can host a server but i dont see anything that says specifically what to set for port forwarding etc. please just consider me a massive idiot and youre explaing it to a child
i'm using kobold as a server and i want my friend to connect, how?

Anonymous
11/22/24(Fri)00:32:42 No.103264777

Anonymous 11/22/24(Fri)00:32:42 No.103264777

>>103264715
Doesn't koboldcpp have a remote tunnel option?
You can set an api key for llamacpp, but you'll still have to port forward it and either use a static ip or a dynamic dns service

Anonymous
11/22/24(Fri)00:34:16 No.103264790

Anonymous 11/22/24(Fri)00:34:16 No.103264790

>>103264271
>https://github.com/domsob/comfygi
>The source code will be published here soon. If you have any questions, please do not hesitate to contact the authors.
>soon
Why not now?

Anonymous
11/22/24(Fri)00:39:01 No.103264823

Anonymous 11/22/24(Fri)00:39:01 No.103264823

>>103264647
q5_k_s is disappointing. It's smart, but it goes off on schizo tangents a lot in RP. I do have some unused runpod credits sitting around in my account from forever ago, I should use them to try out Q8_0 largestral see if it's a quant issue.

Anonymous
11/22/24(Fri)00:41:37 No.103264844

Anonymous 11/22/24(Fri)00:41:37 No.103264844

>>103264773
Just forward port 5001 which is what koboldcpp listens on. Or use a tunnel like ngrok or cloudflare.

Anonymous
11/22/24(Fri)00:43:03 No.103264850

Anonymous 11/22/24(Fri)00:43:03 No.103264850

>>103264773
Or better yet, get a vps, and use a ssh reverse tunnel to port forward.

Anonymous
11/22/24(Fri)00:48:14 No.103264883

Anonymous 11/22/24(Fri)00:48:14 No.103264883

>>103264773
you should, unironically, use your local llm to help you solve this problem

Anonymous
11/22/24(Fri)00:49:40 No.103264892

Anonymous 11/22/24(Fri)00:49:40 No.103264892

>>103264773
This isn't something anyone can explain to you in a single post. You need to forward the ports on your router to the machine running kobold, assuming your router gives you access, have kobold listen on external ip, and give your friend your public IP address. Or set up Wireguard and go through the same process if you don't want the Chinese pwning your network within 5 minutes. Just google "port forwarding" or "wireguard" tutorials and come back if you have a specific issue.

Anonymous
11/22/24(Fri)00:56:55 No.103264934

Anonymous 11/22/24(Fri)00:56:55 No.103264934

>>103263534
Hey rich anon, is there a chance you could try a lower quant of these large models? I’m currently using 2.85 and wonder how different they would feel to someone who’s used to 5. I’d test using open router but I know I’d end up placeboing hard.

Anonymous
11/22/24(Fri)01:14:14 No.103265010

Anonymous 11/22/24(Fri)01:14:14 No.103265010

>>103264040
>[Model Weights Coming Soon]

Anonymous
11/22/24(Fri)01:16:37 No.103265024

Anonymous 11/22/24(Fri)01:16:37 No.103265024

>>103264040
>Our Hymba-1.5B-Base model surpasses all sub-2B public models in performance and even outperforms Llama-3.2-3B with 1.32% higher average accuracy
that's disingenuous, they haven't trained the model the same way as llama, they can't attribute this improvement soely based on the architecture change

Anonymous
11/22/24(Fri)01:22:08 No.103265044

Anonymous 11/22/24(Fri)01:22:08 No.103265044

>0.55 t/s
fuck
it's so smart though

Anonymous
11/22/24(Fri)01:39:54 No.103265117

Anonymous 11/22/24(Fri)01:39:54 No.103265117

Any good VRAMlet models for erotica story writing?

Anonymous
11/22/24(Fri)01:40:10 No.103265119

Anonymous 11/22/24(Fri)01:40:10 No.103265119

File: Henamiku.png (614 KB, 700x800)

614 KB PNG

オヤスミ、/エル・エム・ジー/

Anonymous
11/22/24(Fri)01:43:11 No.103265135

Anonymous 11/22/24(Fri)01:43:11 No.103265135

File: file.png (12 KB, 840x182)

12 KB PNG

>>103256272
is it normal for http folder to be 7.5GB?
i failed setting up xtts2 & gave up months ago & now I check for the drive size I saw this shit

Anonymous
11/22/24(Fri)01:47:08 No.103265161

Anonymous 11/22/24(Fri)01:47:08 No.103265161

>>103265135
python -m pip cache purge

Anonymous
11/22/24(Fri)01:47:35 No.103265165

Anonymous 11/22/24(Fri)01:47:35 No.103265165

>>103265135
>is it normal
Its not normal for an http folder to be 7.5GB
but on python it is

Anonymous
11/22/24(Fri)01:52:01 No.103265179

Anonymous 11/22/24(Fri)01:52:01 No.103265179

>>103265119
オヤスミ、ミク

Anonymous
11/22/24(Fri)01:59:01 No.103265215

Anonymous 11/22/24(Fri)01:59:01 No.103265215

>>103265207
>>103265207
>>103265207

Anonymous
11/22/24(Fri)02:00:04 No.103265220

Anonymous 11/22/24(Fri)02:00:04 No.103265220

>>103258483
in the case of lack of document preservation, the federal rule of civil procedure, state that the jury is supposed to infer that the data that was missing, was prejudicial to their case and they have to rebut the presumption. But sometimes there is shit inside of these emails, where you cant 'unring the bell', like if Sam said something so outrageous that it would tank their entire defense.

Anonymous
11/22/24(Fri)02:03:36 No.103265240

Anonymous 11/22/24(Fri)02:03:36 No.103265240

>>103260927
>I have a pc but I'm not going to swap my ddr5 ram for 4 ddr4 sticks that run at much lower speeds just to use some shitty oversized moe

Your bottleneck should be memory bandwidth, so plan accordingly.

Anonymous
11/22/24(Fri)02:19:51 No.103265330

Anonymous 11/22/24(Fri)02:19:51 No.103265330

>>103264382
All of this is already possible, gluing everything together efficiently is a pain though.

Anonymous
11/22/24(Fri)02:52:32 No.103265497

Anonymous 11/22/24(Fri)02:52:32 No.103265497

Hymba will save local

Anonymous
11/22/24(Fri)03:00:53 No.103265535

Anonymous 11/22/24(Fri)03:00:53 No.103265535

>>103265161
thx
998 files removed

Anonymous
11/22/24(Fri)05:43:33 No.103266380

Anonymous 11/22/24(Fri)05:43:33 No.103266380

>>103265330
Not really, this is still the biggest deal breaker:
>another tiny LLM for a quality summary at the tail end of context, and managing the story, ideally actively editing the lorebook
If we had that, it would be essentially a free infinite context. And the main issue with diffusion models is that they cannot consistently generate the same character without training a LoRA for that specific character.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.