/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 12/16/24(Mon)06:10:43 No.103536775

File: Ge0_D0fbMAAFI19.jpg (784 KB, 3204x4096)

784 KB JPG

/lmg/ - Local Models General Anonymous 12/16/24(Mon)06:10:43 No.103536775 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103525265 & >>103515753

►News
>(12/14) CosyVoice2-0.5B released: https://funaudiollm.github.io/cosyvoice2/
>(12/14) Qwen2VL support merged: https://github.com/ggerganov/llama.cpp/pull/10361
>(12/13) Sberbank releases Russian model based on DeepseekForCausalLM: https://hf.co/ai-sage/GigaChat-20B-A3B-instruct
>(12/13) DeepSeek-VL2/-Small/-Tiny release. MoE vision models with 4.5B/2.8B/1.0B active parameters: https://hf.co/deepseek-ai/deepseek-vl2
>(12/13) Cohere releases Command-R7B: https://cohere.com/blog/command-r7b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/hsiehjackson/RULER
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
12/16/24(Mon)06:11:14 No.103536779

Anonymous 12/16/24(Mon)06:11:14 No.103536779

File: 1691305744399524.png (128 KB, 800x933)

128 KB PNG

►Recent Highlights from the Previous Thread: >>103525265

--Paper: CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models:
>103535406 >103535921
--SillyTavern alternatives and Horde setup discussion:
>103528320 >103528393 >103528436 >103528449 >103528495 >103531379 >103528632 >103528674 >103528694 >103529152
--Machine translating Japanese games in emulators: OCR challenges and alternatives:
>103535278 >103535335 >103535525 >103535593 >103535605 >103535766 >103535817 >103535884 >103535777 >103535799
--Anon seeks offline AI waifu companion:
>103530400 >103530414 >103530426 >103530445 >103530449 >103531933
--New models added to llama.cpp, including Qwen2VL and GigaChat:
>103530184 >103530256 >103530311 >103530347 >103530384 >103530395 >103530413
--Discussion on neural network models and upscaling techniques:
>103530495 >103534679 >103534738 >103534778 >103534803 >103534853 >103534858
--Phi4 weights testing and performance discussion:
>103529570 >103529626 >103529658 >103529824 >103531089 >103531393
--LLMs generating novel words and their internal representations:
>103532043 >103532124 >103532221 >103532326 >103532321
--Anon asks for llama 1 download and gets help with GGUF and Hugging Face models:
>103525965 >103525982 >103527006 >103527047
--Debating the merits of 70b models at low bit depths vs small models at high bit depths:
>103530010 >103530017 >103530028 >103530095 >103530226 >103530333 >103530555 >103530132 >103530164 >103531408
--OpenAI's $2000/month subscription model and PhD-level intelligence claims scrutinized:
>103526958 >103526983 >103527039 >103527070 >103527254 >103527417 >103528331
--Anon hesitant to buy 5090 for LLM inference due to high price and power consumption:
>103530725 >103530782 >103530817 >103531756 >103531784 >103531799 >103531859 >103531844
--Miku (free space):
>103527704 >103536763

►Recent Highlight Posts from the Previous Thread: >>103525267

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous
12/16/24(Mon)06:19:44 No.103536827

Anonymous 12/16/24(Mon)06:19:44 No.103536827

>>103536775
>ISO/JIS enter key
Disgusting. I should push the Miku back to save her from the horror.

Anonymous
12/16/24(Mon)06:23:33 No.103536845

Anonymous 12/16/24(Mon)06:23:33 No.103536845

https://huggingface.co/Apollo-LMMs/Apollo-7B-t32
https://huggingface.co/papers/2412.10360

I just want live video stream into the model, that would be cool.

>, the underlying mechanisms driving their video understanding remain poorly understood. Consequently, many design decisions in this domain are made without proper justification or analysis.
Based

Anonymous
12/16/24(Mon)07:06:47 No.103537120

Anonymous 12/16/24(Mon)07:06:47 No.103537120

File: HunyuanVideo_00432.mp4 (472 KB, 640x480)

472 KB MP4

Anonymous
12/16/24(Mon)07:27:06 No.103537227

Anonymous 12/16/24(Mon)07:27:06 No.103537227

File: 1707467750791450.mp4 (224 KB, 1504x934)

224 KB MP4

Anonymous
12/16/24(Mon)08:23:17 No.103537558

Anonymous 12/16/24(Mon)08:23:17 No.103537558

I decided to revisit Nemo using different presets and I'm actually having quite a bit of fun with it

Anonymous
12/16/24(Mon)08:27:12 No.103537588

Anonymous 12/16/24(Mon)08:27:12 No.103537588

>>103537558
Nemo is great.
Truly the champion of it's weight class, and it performs super well all the way down to 4bpw.
I wonder what the fuck they did to make this model and if that approach scales.
Maybe it's really just a question of data instead of any "gimmicks" or fancy techniques.

Anonymous
12/16/24(Mon)08:33:00 No.103537619

Anonymous 12/16/24(Mon)08:33:00 No.103537619

>>103537227
for the ones doubting, this is legit btw, oyest of veys

Anonymous
12/16/24(Mon)08:37:16 No.103537641

Anonymous 12/16/24(Mon)08:37:16 No.103537641

>>103536775
>(12/14) CosyVoice2-0.5B released: https://funaudiollm.github.io/cosyvoice2/

I keep getting a special token error when I try and run anything with this. Has anyone tried it?

Anonymous
12/16/24(Mon)08:52:39 No.103537746

Anonymous 12/16/24(Mon)08:52:39 No.103537746

After more testing, I'm more or less convinced that the impression of Llama 3.3 apparently being more permissive in terms of content was just a happy coincidence. It doesn't handle well snuff, gore, non-consensual, taboo content in particular if "include names" is disabled in SillyTavern. If anything, the trick of changing the role from "assistant" to something else seems less effective than previous Llama versions.

Anonymous
12/16/24(Mon)09:05:20 No.103537827

Anonymous 12/16/24(Mon)09:05:20 No.103537827

Anons, what are you REALLY using this shit for? There is no point running models locally for most tasks, the mainstream offers vastly outperform whatever shit you can run locally... unless you are a privacy schizo or a cooomer.

Anonymous
12/16/24(Mon)09:09:33 No.103537852

Anonymous 12/16/24(Mon)09:09:33 No.103537852

>>103537827
>unless you are a privacy schizo or a cooomer.
I am both, thank you very much.
It's also a neat toy to play around with.

>>103537746
Have you tried the usual steering tactics like adding a list of tags as a preffil to the assistant's message, changing the role to narrator, etc?

Anonymous
12/16/24(Mon)09:17:15 No.103537901

Anonymous 12/16/24(Mon)09:17:15 No.103537901

File: scrubs-where-do-you-think(...).gif (1017 KB, 640x360)

1017 KB GIF

>>103537827
>unless you are a privacy schizo or a cooomer.
like everyone else here?

Anonymous
12/16/24(Mon)09:21:45 No.103537932

Anonymous 12/16/24(Mon)09:21:45 No.103537932

>>103537827
Coomer. Doing (e)rp and nothing else. Output is decent as my computer can handle large models ok. Currently find most fun in creating new cards, trying different prompting and trying out new models nowadays.
If I didn't have money-draining GF maybe I would try claude, better gpu setup or something.

Anonymous
12/16/24(Mon)09:23:42 No.103537944

Anonymous 12/16/24(Mon)09:23:42 No.103537944

>>103537852
You can eventually coax Llama 3.3 into writing what you want if you prefill assistant messages with suitable text (including simply {{char}}:), but it really wants to avoid violence, gore and so on. It doesn't have too many issues with consensual sexual content in that case.

Removing {{char}}: from the prompt turns it into a cucked assistant, almost no matter what you write in the system prompt or the prior conversation history.

Anonymous
12/16/24(Mon)09:31:35 No.103538009

Anonymous 12/16/24(Mon)09:31:35 No.103538009

>>103537827
(E)RP, that's it.
I have tried to use LLMs for other stuff like data processing, classification or transformation... but the end result is never perfect, even if you use cloud models, so I end up having to review all results which become a bottleneck if I'm dealing with huge amounts of data.

Anonymous
12/16/24(Mon)09:37:00 No.103538067

Anonymous 12/16/24(Mon)09:37:00 No.103538067

>>103537827
SARR IT WORKED
POST HAVE MAKE YOU BUTIFUL LADY

Anonymous
12/16/24(Mon)09:41:03 No.103538106

Anonymous 12/16/24(Mon)09:41:03 No.103538106

I just woke up so I'm retarded and trying to figure out this chat template for that russian model while it downloads
>{%- set system_message = bos_token + messages[0]['content'] + additional_special_tokens[1]

Is it trying to tell me that the format is
role<|role_sep|>
blah blah blah
?
Basically backwards ChatML style?

Anonymous
12/16/24(Mon)09:50:02 No.103538183

Anonymous 12/16/24(Mon)09:50:02 No.103538183

>>103537827
>privacy schizo or a cooomer.
I'm both

Anonymous
12/16/24(Mon)09:51:31 No.103538193

Anonymous 12/16/24(Mon)09:51:31 No.103538193

>>103538106
well I'm too lazy to update my shit and apparently nothing I have can load it despite it effectively just using the deepseek architecture. no russian nala test today.

Anonymous
12/16/24(Mon)09:53:30 No.103538206

Anonymous 12/16/24(Mon)09:53:30 No.103538206

>>103537641
downloading the model right now. will tell later

Anonymous
12/16/24(Mon)09:57:56 No.103538237

Anonymous 12/16/24(Mon)09:57:56 No.103538237

>>103537852
>>103537901
>>103537932
>>103538009
>>103538183
wait so this is /g/'s coomer general?

Anonymous
12/16/24(Mon)10:04:36 No.103538279

Anonymous 12/16/24(Mon)10:04:36 No.103538279

>>103538237
There doesn't seem to be any difference between this general and the chatbot general.
I'm using an embedding model for semantic search, I started the project 4 days ago. After it's working for that, I'm intending to find a chat model that can run in a RAG context in 6 GB VRAM (looking at
Llama-3.2-1B-Instruct, but I don't see where to download it without kneeling to the zuck)

Anonymous
12/16/24(Mon)10:07:01 No.103538297

Anonymous 12/16/24(Mon)10:07:01 No.103538297

>>103538279
https://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF

Anonymous
12/16/24(Mon)10:08:05 No.103538306

Anonymous 12/16/24(Mon)10:08:05 No.103538306

>>103538297
thanks!

Anonymous
12/16/24(Mon)10:09:03 No.103538315

Anonymous 12/16/24(Mon)10:09:03 No.103538315

>>103536775
The growth of LLMs will slow drastically, as the flaw in "neural scaling law" becomes apparent: most people have no use case for a second brain! By 2030, it will become clear that the LLMs impact on the economy has been no greater than that of the internet.

Anonymous
12/16/24(Mon)10:22:01 No.103538430

Anonymous 12/16/24(Mon)10:22:01 No.103538430

HN told me that there has been huge improvement in local models in the last year? How true is this? can they effectively ERP yet?

Anonymous
12/16/24(Mon)10:30:56 No.103538497

Anonymous 12/16/24(Mon)10:30:56 No.103538497

Are they any locally hostable models that are usable for language learning or anything else useful like answering questions ala chatgpt?
Also, how do people access their models remotely? Has anyone bridged one to a chat app like Signal or Whatsapp?

Anonymous
12/16/24(Mon)10:32:29 No.103538509

Anonymous 12/16/24(Mon)10:32:29 No.103538509

eva bwos we got a version bump
https://huggingface.co/EVA-UNIT-01/EVA-LLaMA-3.33-70B-v0.1

Anonymous
12/16/24(Mon)10:33:35 No.103538521

Anonymous 12/16/24(Mon)10:33:35 No.103538521

kill yourself and buy an ad, in that order.

L3.3fag !!SB6Q3O4XU7f
12/16/24(Mon)10:36:55 No.103538545

L3.3fag !!SB6Q3O4XU7f 12/16/24(Mon)10:36:55 No.103538545

>>103538509
>DELLA linear merge of v0.0 with an unreleased checkpoint from a different run. Reduced overfitting, better long context comprehension and recall, less repetition, more stability.

Ooh, sounds promising. Will test and report back.

Anonymous
12/16/24(Mon)10:37:23 No.103538551

Anonymous 12/16/24(Mon)10:37:23 No.103538551

>>103538206
I get "KeyError: '' ". I dunno what's wrong..

Anonymous
12/16/24(Mon)10:41:02 No.103538579

Anonymous 12/16/24(Mon)10:41:02 No.103538579

File: Screenshot 2024-12-16 153050.png (711 KB, 1923x1309)

711 KB PNG

Gemini 2.0 flash experimental.

Heh.

Anonymous
12/16/24(Mon)11:01:35 No.103538763

Anonymous 12/16/24(Mon)11:01:35 No.103538763

File: 1715367001153147.jpg (954 KB, 800x1237)

954 KB JPG

>>103537558
>different presets
What are you using?
>>103537827
RP and co-authoring/editing practice. I'll occasionally use it to summarize news/wiki articles and essays I don't care about, but since I can't into RAG it's not ideal.

Anonymous
12/16/24(Mon)11:03:23 No.103538783

Anonymous 12/16/24(Mon)11:03:23 No.103538783

>>103536845
>Meta Apollo
>We employed the Qwen2.5 (Yang et al., 2024) series of Large Language Models (LLMs) at varying scales to serve as the backbone for Apollo. Specifically, we utilized models with 1.5B, 3B, and 7B parameters

Meta literally used Qwen as a base for their video understanding model, lmao. Chinks literally cant stop winning...

Anonymous
12/16/24(Mon)11:06:15 No.103538809

Anonymous 12/16/24(Mon)11:06:15 No.103538809

>>103538783
>Chinks literally cant stop winning...
Papers are like 90% chink already. For Apollo too.
As long as they give me the foxgirls I will kneel.

Anonymous
12/16/24(Mon)11:06:52 No.103538818

Anonymous 12/16/24(Mon)11:06:52 No.103538818

>>103538551
the download script doesn't work if you download only the 0.5B model. The CosyVoice-300M folder has the voices. I was able to generate something but I wasn't able to generate english yet.

I thought this was more speech-to-speech but it doesn't seem so aimed at real time inference.

Anonymous
12/16/24(Mon)11:10:50 No.103538851

Anonymous 12/16/24(Mon)11:10:50 No.103538851

anyone know a way to stop outputting descriptions in asterisks? I could imagine things but feels like descriptions made without asterisks are more verbose and interesting.

Anonymous
12/16/24(Mon)11:14:15 No.103538873

Anonymous 12/16/24(Mon)11:14:15 No.103538873

>>103538851
take the asterisks out of the starter and example messages, the model won't use them unprompted unless it's very overbaked

Anonymous
12/16/24(Mon)11:26:41 No.103538961

Anonymous 12/16/24(Mon)11:26:41 No.103538961

File: Screenshot 2024-12-16.png (54 KB, 945x614)

54 KB PNG

Nashvillebros is this true?

Anonymous
12/16/24(Mon)11:30:35 No.103538989

Anonymous 12/16/24(Mon)11:30:35 No.103538989

>>103538579
I hecking LOVE LOVE LOVE CoT!

L3.3fag !!SB6Q3O4XU7f
12/16/24(Mon)11:33:08 No.103539013

L3.3fag !!SB6Q3O4XU7f 12/16/24(Mon)11:33:08 No.103539013

>>103538851
What >>103538873 said, plus if that's not enough, you can outright tell it not to use asterisks in the system prompt.

Anonymous
12/16/24(Mon)11:55:56 No.103539211

Anonymous 12/16/24(Mon)11:55:56 No.103539211

>llama.cpp stopped supporting regular make
>my cmake is horribly fucked up because of some unholy shit I did to compile some other project I never actually used and I can no longer build
aieeeeee gerganov whyyyyyy

Anonymous
12/16/24(Mon)11:56:04 No.103539213

Anonymous 12/16/24(Mon)11:56:04 No.103539213

>>103537827
I use it as a helpful assistant.
I suck at approaching subjects I’m interested in and QwQ really shines at planing stuff so it is really helpful with that.
I don’t care if it is lacking in knowledge since you can just feed it as context if needed.

Anonymous
12/16/24(Mon)11:57:04 No.103539220

Anonymous 12/16/24(Mon)11:57:04 No.103539220

>>103539211
Same here. I had to download the binaries.

Anonymous
12/16/24(Mon)12:03:32 No.103539276

Anonymous 12/16/24(Mon)12:03:32 No.103539276

what's the best general model that I can use for chatting, got 24gb vram?

Anonymous
12/16/24(Mon)12:05:52 No.103539306

Anonymous 12/16/24(Mon)12:05:52 No.103539306

>try OuteTTS
>works nicely out of the box
>try to voice clone
>no output
I HATE TTS
I HATE TTS
I HATE TTS

Anonymous
12/16/24(Mon)12:11:44 No.103539351

Anonymous 12/16/24(Mon)12:11:44 No.103539351

>>103539276
The following models should fit in 24gb.
>qwen 32b q8
>mistral 22b q6
>mistral 12b q8

Anonymous
12/16/24(Mon)12:13:11 No.103539364

Anonymous 12/16/24(Mon)12:13:11 No.103539364

>>103539351
Err, that should be
>qwen 32b q4

Anonymous
12/16/24(Mon)12:13:30 No.103539369

Anonymous 12/16/24(Mon)12:13:30 No.103539369

>>103539306
>OuteTTS
Why not gpt-sovitsv2?

Anonymous
12/16/24(Mon)12:15:47 No.103539391

Anonymous 12/16/24(Mon)12:15:47 No.103539391

>>103539369
naaaah bruh this shit too complicated, theres literally no exe lol

Anonymous
12/16/24(Mon)12:16:07 No.103539397

Anonymous 12/16/24(Mon)12:16:07 No.103539397

>>103539369
sovitsv had way too many steps to produce anything

Anonymous
12/16/24(Mon)12:16:45 No.103539403

Anonymous 12/16/24(Mon)12:16:45 No.103539403

File: Lms.jpg (33 KB, 509x494)

33 KB JPG

Why do people here hate LM Studio? I'm new to this I've been using without problems. Am I missing something?

Anonymous
12/16/24(Mon)12:20:29 No.103539436

Anonymous 12/16/24(Mon)12:20:29 No.103539436

>>103539403
It's proprietary slopware that does not contribute to the upstream projects it's built on.

Anonymous
12/16/24(Mon)12:23:11 No.103539461

Anonymous 12/16/24(Mon)12:23:11 No.103539461

File: 1714205445058131.png (382 KB, 885x1146)

382 KB PNG

Bros, we're gonna eat good very soon. Trust the plan, it's all converging as nvidia tries to stem the tide for VRAM premiums.
https://x.com/_lewtun/status/1868703456602865880

Anonymous
12/16/24(Mon)12:26:08 No.103539484

Anonymous 12/16/24(Mon)12:26:08 No.103539484

>>103539461
BitNet LLM 2.0 LCM BLT CoT breakthrough soon.

Anonymous
12/16/24(Mon)12:27:44 No.103539498

Anonymous 12/16/24(Mon)12:27:44 No.103539498

>>103539403
>Why do people here hate LM Studio?
/lmg/ is privacy/openness focused and has a history of working directly with and contributing to the open source projects that were at the genesis of the current ai boom (and even in the before times).
A lot of the current user-friendly AI systems are just wrapping these without giving back or even acknowledging these bits exist.
Also, if you're here and worried about privacy, you aren't gonna be running some exe you downloaded from the internet and feeding it personal or proprietary data.

Anonymous
12/16/24(Mon)12:29:44 No.103539509

Anonymous 12/16/24(Mon)12:29:44 No.103539509

>>103539498
memeing a miku.txt prompt into llama cpp's repo during its inception does not count as having contributed to open source projects. Literally no one still browsing this thread can code.

Anonymous
12/16/24(Mon)12:30:54 No.103539520

Anonymous 12/16/24(Mon)12:30:54 No.103539520

>>103538763
>What are you using?
Just Temperature 1.0 and Repetition Penalty 1.1

Anonymous
12/16/24(Mon)12:32:32 No.103539535

Anonymous 12/16/24(Mon)12:32:32 No.103539535

>>103536845
Finally, another honest paper.

Anonymous
12/16/24(Mon)12:40:35 No.103539613

Anonymous 12/16/24(Mon)12:40:35 No.103539613

>>103539211
>>103539220
Kek, same happened to me as well. I eventually solved it. Don't remember exactly how. Oh wait nvm I do remember. I eventually found that I needed to delete the nvcc in my usr/bin folder, and run with these commands.
export PATH=/usr/local/cuda/bin:$PATH
cmake -B build -DGGML_CUDA=ON -DGGML_LLAMAFILE=OFF
cmake --build build --config Release --target llama-server llama-quantize llama-perplexity -j 8
I actually tried to get ShatGPT to help me at first but it didn't find the issue. I found the issue myself. Maybe Claude would've gotten it.

Anonymous
12/16/24(Mon)12:51:08 No.103539731

Anonymous 12/16/24(Mon)12:51:08 No.103539731

>>103539403
jesus christ I hate zooming reddit cucks like you so fucking much you have no idea
greetings from serbia

Hi all, Drummer here...
12/16/24(Mon)12:52:35 No.103539750

Hi all, Drummer here... 12/16/24(Mon)12:52:35 No.103539750

For those fans of L3.3, any thoughts on its NSFW/NSFL capabilities? It seems to get sloppier and dumber when you cross that line.

Anonymous
12/16/24(Mon)13:00:33 No.103539802

Anonymous 12/16/24(Mon)13:00:33 No.103539802

>>103539750
Best lolis by a long shot, nothing comes close at all.

L3.3fag !!SB6Q3O4XU7f
12/16/24(Mon)13:02:59 No.103539820

L3.3fag !!SB6Q3O4XU7f 12/16/24(Mon)13:02:59 No.103539820

>>103539750
Haven't actually tried base L3.3 for such purposes, but Eva is absolute peak, so at the very least, it makes for a damn good base for a tune. Clever to the point of wit, coherent, requires very little wrangling.

Anonymous
12/16/24(Mon)13:09:22 No.103539878

Anonymous 12/16/24(Mon)13:09:22 No.103539878

>>103539750
3.3 is trash, there's just one anon samefagging with a clear intention to troll.

Anonymous
12/16/24(Mon)13:10:46 No.103539890

Anonymous 12/16/24(Mon)13:10:46 No.103539890

>>103539750
The base model? Seemed fine to me, but I'm the guy that was using the assistant and user names switched out so that might have gotten past some censorship biases in the logits. I know for a fact at least that using user/assistant made my model dumber in a NSFW adjacent scenario I tested, so I just switched it out and forgot about it. Then I moved to testing Eva and haven't loaded up base since.

Anonymous
12/16/24(Mon)13:12:13 No.103539905

Anonymous 12/16/24(Mon)13:12:13 No.103539905

>>103539878
Compared to what? Mistral Large is too slow on my machine, Qwen is even worse, and Miqu is too old and dumb.

Anonymous
12/16/24(Mon)13:14:06 No.103539918

Anonymous 12/16/24(Mon)13:14:06 No.103539918

File: questionmarkfolderimage592.webm (478 KB, 362x598)

478 KB WEBM

Is Nemo still SOTA for VRAMlets?

Anonymous
12/16/24(Mon)13:15:44 No.103539929

Anonymous 12/16/24(Mon)13:15:44 No.103539929

any new great models for nsfw stuff at around 12b?

Anonymous
12/16/24(Mon)13:17:11 No.103539947

Anonymous 12/16/24(Mon)13:17:11 No.103539947

File: 1461486806692.jpg (20 KB, 470x362)

20 KB JPG

>>103539403
They don't like it because you don't need to be a complete nerd to install and run the program.
If they admit it's serviceable then they admit they wasted their time dorking around in Python when an exe could have done the job in 15 seconds.

llama.cpp CUDA dev !!OM2Fp6Fn93S
12/16/24(Mon)13:18:38 No.103539962

llama.cpp CUDA dev !!OM2Fp6Fn93S 12/16/24(Mon)13:18:38 No.103539962

>>103539509
I read or at least skim every single /lmg/ thread.

Anonymous
12/16/24(Mon)13:19:47 No.103539969

Anonymous 12/16/24(Mon)13:19:47 No.103539969

>>103539918
yes...

Anonymous
12/16/24(Mon)13:19:57 No.103539970

Anonymous 12/16/24(Mon)13:19:57 No.103539970

>>103539484
With test time training liquid network lora

Anonymous
12/16/24(Mon)13:21:25 No.103539983

Anonymous 12/16/24(Mon)13:21:25 No.103539983

>>103539509
oobabooga used to be a /lmg/ anon, you know?

Anonymous
12/16/24(Mon)13:21:59 No.103539990

Anonymous 12/16/24(Mon)13:21:59 No.103539990

>>103539962
we know

Anonymous
12/16/24(Mon)13:22:34 No.103539993

Anonymous 12/16/24(Mon)13:22:34 No.103539993

>>103539962
being able to read the documentation and use an API is NOT coding.

Anonymous
12/16/24(Mon)13:25:43 No.103540037

Anonymous 12/16/24(Mon)13:25:43 No.103540037

>>103539820
Eva ?

Anonymous
12/16/24(Mon)13:26:59 No.103540049

Anonymous 12/16/24(Mon)13:26:59 No.103540049

>>103538509
first impression - it's fine, but it feels like it lost a little of the eva sovl
should be more easily wrangled but I find it a little less fun than v0.0 on a quick vibe check

L3.3fag !!SB6Q3O4XU7f
12/16/24(Mon)13:27:22 No.103540053

L3.3fag !!SB6Q3O4XU7f 12/16/24(Mon)13:27:22 No.103540053

>>103540037
https://huggingface.co/EVA-UNIT-01/EVA-LLaMA-3.33-70B-v0.0

v0.1 is out now, but doesn't seem to have any quants yet.

Anonymous
12/16/24(Mon)13:28:06 No.103540060

Anonymous 12/16/24(Mon)13:28:06 No.103540060

What's the best coom model for 90GB of VRAM? I got two decommissioned A6000s for (relatively) cheap from work.

Anonymous
12/16/24(Mon)13:30:23 No.103540078

Anonymous 12/16/24(Mon)13:30:23 No.103540078

>>103540060

>>103540053
>>103538509

Anonymous
12/16/24(Mon)13:31:28 No.103540084

Anonymous 12/16/24(Mon)13:31:28 No.103540084

File: file.png (19 KB, 499x144)

19 KB PNG

>>103540053
>doesn't seem to have any quants yet.
https://huggingface.co/bartowski/EVA-LLaMA-3.33-70B-v0.1-GGUF
right on time

Anonymous
12/16/24(Mon)13:33:58 No.103540102

Anonymous 12/16/24(Mon)13:33:58 No.103540102

>>103539750
Doesn't merge well with Tulu due to differences in prompt format so it's DOA

Anonymous
12/16/24(Mon)13:34:57 No.103540112

Anonymous 12/16/24(Mon)13:34:57 No.103540112

>>103539890
>I know for a fact at least that using user/assistant made my model dumber in a NSFW adjacent scenario I tested, so I just switched it out and forgot about it.
What technique is this referring to? Just treating the instruct model as a text completion one without any user assistant turns?

Anonymous
12/16/24(Mon)13:50:42 No.103540227

Anonymous 12/16/24(Mon)13:50:42 No.103540227

File: file.png (125 KB, 604x546)

125 KB PNG

ollama wonned again

Anonymous
12/16/24(Mon)13:56:02 No.103540262

Anonymous 12/16/24(Mon)13:56:02 No.103540262

>>103540227
niggerganov should just ack himself at this point, or sell the llama.cpp project to ollama.

Anonymous
12/16/24(Mon)13:56:50 No.103540268

Anonymous 12/16/24(Mon)13:56:50 No.103540268

File: angryshikanoko.webm (3.87 MB, 1920x1080)

3.87 MB WEBM

>>103539969

Anonymous
12/16/24(Mon)13:56:55 No.103540270

Anonymous 12/16/24(Mon)13:56:55 No.103540270

>>103540112
I'm pretty sure it's about using the instruct format but without the user and assistant roles the model was trained with.
So if your format is
><specialtoken>assistant<specialtoken>
><specialtoken>user<specialtoken>
You could change it to be
><specialtoken>CharacterName<specialtoken>
><specialtoken>UserPersonaName<specialtoken>
or
><specialtoken>Game Master<specialtoken>
><specialtoken>Player<specialtoken>
or
><specialtoken>Narrator<specialtoken>
><specialtoken>Dude<specialtoken>
etc etc.

Anonymous
12/16/24(Mon)13:58:51 No.103540287

Anonymous 12/16/24(Mon)13:58:51 No.103540287

>>103539929
which ones are you using at 12B? The ones I use aren't exactly new
unironically check the first 5:
https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard

Anonymous
12/16/24(Mon)13:59:07 No.103540292

Anonymous 12/16/24(Mon)13:59:07 No.103540292

>>103540262
no no no
we let them feel safe, then a surprise license rug pull

Anonymous
12/16/24(Mon)13:59:09 No.103540293

Anonymous 12/16/24(Mon)13:59:09 No.103540293

>>103540112
What the other guy said.
>>103540270
I'm skeptical it wasn't trained a bit with other names. The model should be much more rigid if that was the case, but it feels to me like it's quite flexible.

Anonymous
12/16/24(Mon)14:03:32 No.103540321

Anonymous 12/16/24(Mon)14:03:32 No.103540321

>>103540227
I hate pgvector and ollama so fucking much

Anonymous
12/16/24(Mon)14:04:54 No.103540330

Anonymous 12/16/24(Mon)14:04:54 No.103540330

>>103539947
Kek this, I just want to coom and for the thing to just werk, I want a solution. I don't want to fiddle around running stupid commands when it can be done in a few clicks.

Anonymous
12/16/24(Mon)14:05:41 No.103540335

Anonymous 12/16/24(Mon)14:05:41 No.103540335

does fucking around in the CLI for three hours compiling shit make the orgasm feel better?

Anonymous
12/16/24(Mon)14:10:31 No.103540364

Anonymous 12/16/24(Mon)14:10:31 No.103540364

>>103539461
That can only fix its reasoning, not its lack of trivia knowledge.

Anonymous
12/16/24(Mon)14:11:40 No.103540370

Anonymous 12/16/24(Mon)14:11:40 No.103540370

buying a 5090 for AI erp would be the beta buxx experience for perma virgins
paying to get laid, but instead of paying a wife for real pussy, you pay for technology to masturbate

Anonymous
12/16/24(Mon)14:11:51 No.103540374

Anonymous 12/16/24(Mon)14:11:51 No.103540374

>>103540364
#LLM2.0, rag is the future, resistance is ignorance.

Anonymous
12/16/24(Mon)14:13:18 No.103540387

Anonymous 12/16/24(Mon)14:13:18 No.103540387

>>103540287
Is TheDrummer/UnslopNemo-v2 (chatml) really an upgrade from mistralai/Mistral-Nemo-Instruct-2407?
I've seen people here say in the past that all Nemo finetunes are dumber than Nemo itself.

Anonymous
12/16/24(Mon)14:13:33 No.103540392

Anonymous 12/16/24(Mon)14:13:33 No.103540392

>>103540374
Just like bitnet huh

Anonymous
12/16/24(Mon)14:15:44 No.103540423

Anonymous 12/16/24(Mon)14:15:44 No.103540423

>>103540392
But RAG exists and sort of works. Bitnet is just vaporware.

Anonymous
12/16/24(Mon)14:16:38 No.103540430

Anonymous 12/16/24(Mon)14:16:38 No.103540430

>>103540423
Bitnet models exist and sort of works.

Anonymous
12/16/24(Mon)14:16:52 No.103540432

Anonymous 12/16/24(Mon)14:16:52 No.103540432

>>103540335
From experience, it just intensifies the "oh man, all that effort for that?" feeling you get afterwards.

Anonymous
12/16/24(Mon)14:17:40 No.103540437

Anonymous 12/16/24(Mon)14:17:40 No.103540437

>>103540430
That's a stretch but alright, stay coping on LLM 1.0 while the world moves on!

Anonymous
12/16/24(Mon)14:18:59 No.103540449

Anonymous 12/16/24(Mon)14:18:59 No.103540449

>>103540370
im rich tho? so i can afford it

Anonymous
12/16/24(Mon)14:19:05 No.103540451

Anonymous 12/16/24(Mon)14:19:05 No.103540451

>>103540335
I don't use AI for coom.

Anonymous
12/16/24(Mon)14:19:30 No.103540454

Anonymous 12/16/24(Mon)14:19:30 No.103540454

>>103540451
fag

Anonymous
12/16/24(Mon)14:20:42 No.103540465

Anonymous 12/16/24(Mon)14:20:42 No.103540465

Llama next year, Gemma next year, Qwen next year, anything this year? Also, local Suno where

Anonymous
12/16/24(Mon)14:22:03 No.103540492

Anonymous 12/16/24(Mon)14:22:03 No.103540492

>>103540465
Official Phi-4 is this week.

Anonymous
12/16/24(Mon)14:22:26 No.103540496

Anonymous 12/16/24(Mon)14:22:26 No.103540496

>>103540437
Meanwhile you are waiting for the innovation to come and it just doesn't. Have fun waiting for your world to supposedly move on while people actually enjoy big models that exist (either through API or if they're not a poorfag with their PCs).

Anonymous
12/16/24(Mon)14:22:45 No.103540501

Anonymous 12/16/24(Mon)14:22:45 No.103540501

>>103540492
He meant interesting models though.

Anonymous
12/16/24(Mon)14:24:09 No.103540513

Anonymous 12/16/24(Mon)14:24:09 No.103540513

>>103540501
Phi-4 is VERY interesting

Anonymous
12/16/24(Mon)14:25:14 No.103540522

Anonymous 12/16/24(Mon)14:25:14 No.103540522

>>103540513
no it's not

Anonymous
12/16/24(Mon)14:25:31 No.103540527

Anonymous 12/16/24(Mon)14:25:31 No.103540527

>>103540513
We'll see when Nala anon tests it.

Anonymous
12/16/24(Mon)14:28:13 No.103540552

Anonymous 12/16/24(Mon)14:28:13 No.103540552

>>103540374
>rag
Way too limited. Doesn't allow the model to make inductive jumps in what information it needs.

Anonymous
12/16/24(Mon)14:28:24 No.103540556

Anonymous 12/16/24(Mon)14:28:24 No.103540556

>>103540527
He already did tho
>>103505034

Anonymous
12/16/24(Mon)14:28:31 No.103540559

Anonymous 12/16/24(Mon)14:28:31 No.103540559

>>103540454
Feeling ironic today aren't you.

Anonymous
12/16/24(Mon)14:29:51 No.103540575

Anonymous 12/16/24(Mon)14:29:51 No.103540575

>>103505034
It's over...

Anonymous
12/16/24(Mon)14:32:24 No.103540607

Anonymous 12/16/24(Mon)14:32:24 No.103540607

>>103540370
where can I pay for a wife? I'm in the EU

Anonymous
12/16/24(Mon)14:33:36 No.103540618

Anonymous 12/16/24(Mon)14:33:36 No.103540618

>>103540575
>>103540556
Who cares? There's no point using AI for coom.

Anonymous
12/16/24(Mon)14:34:45 No.103540628

Anonymous 12/16/24(Mon)14:34:45 No.103540628

>>103540556
massive skill issue

Anonymous
12/16/24(Mon)14:37:07 No.103540656

Anonymous 12/16/24(Mon)14:37:07 No.103540656

>>103540387
Depends on the use case really. You will have to see for yourself, so far it has been pretty good for me for erp

Anonymous
12/16/24(Mon)14:38:08 No.103540665

Anonymous 12/16/24(Mon)14:38:08 No.103540665

>>103540628
it's a three word prompt, anon. it's not possible for skill to be involved either way.

Anonymous
12/16/24(Mon)14:42:12 No.103540701

Anonymous 12/16/24(Mon)14:42:12 No.103540701

>New Models: Megrez 3B Instruct and Megrez 3B Omni with Apache 2.0 License
https://huggingface.co/Infinigence/Megrez-3B-Instruct/blob/main/README_EN.md
https://huggingface.co/Infinigence/Megrez-3B-Omni/blob/main/README_EN.md

Merguez 3b kek

Anonymous
12/16/24(Mon)14:42:31 No.103540706

Anonymous 12/16/24(Mon)14:42:31 No.103540706

be aware of the m$ shills for phi...
They're only trying to do the needful

Anonymous
12/16/24(Mon)14:43:34 No.103540711

Anonymous 12/16/24(Mon)14:43:34 No.103540711

>>103540665
It absolutely is, as evidenced by his tests always having super broken text formatting which he himself said weren't due to the models.

Anonymous
12/16/24(Mon)14:47:21 No.103540752

Anonymous 12/16/24(Mon)14:47:21 No.103540752

>>103505034
starts with plain text dialogue, ends that paragraph with plain text narration, next paragraph has quoted and italics dialogue...
yeah, no model above 3B is that broken dude's doing something clearly wrong like conflicting instructions or something.

Anonymous
12/16/24(Mon)14:50:49 No.103540793

Anonymous 12/16/24(Mon)14:50:49 No.103540793

>>103540701
>Values and Safety: While we have made every effort to ensure compliance of the data used during training

Anonymous
12/16/24(Mon)15:02:02 No.103540923

Anonymous 12/16/24(Mon)15:02:02 No.103540923

HunyuanVideo mogged
>>103540020
>Google Veo 2
https://x.com/GoogleDeepMind/status/1868703624714395907
https://xcancel.com/GoogleDeepMind/status/1868703624714395907

Anonymous
12/16/24(Mon)15:03:01 No.103540934

Anonymous 12/16/24(Mon)15:03:01 No.103540934

>>103540923
Cool, where are the weights?

Anonymous
12/16/24(Mon)15:08:33 No.103540988

Anonymous 12/16/24(Mon)15:08:33 No.103540988

So, even after nearly 2 years, we still don't have a local model that even remotely matches the intelligence of gpt4 for ERP?

Anonymous
12/16/24(Mon)15:11:21 No.103541013

Anonymous 12/16/24(Mon)15:11:21 No.103541013

File: if only you knew.png (370 KB, 744x719)

370 KB PNG

What is the conventional wisdom for caching? I am asking about "Use 8 bit cache to save VRAM" and "Use Q4 cache to save VRAM." in oobabooga.
Finally figured out that I need Flash Attention for them to work. (Which is afaik something that I should enable by default as it has zero drawbacks)
I experimented a bit, but didn't really notice too much of a difference, does the differences become pronounced when you are many thousands of tokens into a chain?
Going from none to 8 bit, 8 bit seems to save a larger chunk of VRAM, compared to going from 8 bit to Q4. Is it a good halfway balance?
For reference I tested these on Mistral Nemo, is cache more/less important for other models?

Anonymous
12/16/24(Mon)15:11:55 No.103541022

Anonymous 12/16/24(Mon)15:11:55 No.103541022

>>103540988
Correct.

Anonymous
12/16/24(Mon)15:13:06 No.103541031

Anonymous 12/16/24(Mon)15:13:06 No.103541031

>>103541013
And using GGUF under llama.cpp, I forgot to add.

Anonymous
12/16/24(Mon)15:13:48 No.103541044

Anonymous 12/16/24(Mon)15:13:48 No.103541044

>>103541022
How long do we have to wait
How long do we have to suffer

Anonymous
12/16/24(Mon)15:19:04 No.103541088

Anonymous 12/16/24(Mon)15:19:04 No.103541088

File: lmg.png (1.84 MB, 1387x778)

1.84 MB PNG

>>103536775

Anonymous
12/16/24(Mon)15:23:40 No.103541132

Anonymous 12/16/24(Mon)15:23:40 No.103541132

>>103539509
I've got 2 github accounts with contributor status on llama.cpp, mikupad, ooba and sovits (and multiple lmg-related personal projects up)

Anonymous
12/16/24(Mon)15:42:35 No.103541324

Anonymous 12/16/24(Mon)15:42:35 No.103541324

File: 1722784233319186.png (269 KB, 627x802)

269 KB PNG

>Shipping document suggests that a 24 GB version of Intel's Arc B580 graphics card could be heading to market, though not for gaming

oh no no no no, njudea and ayymd vramjewsbros, how do we respond without assasinating the ceo of intel?

are intel ballsy enough to eat up the entire homebrew community with this?

Anonymous
12/16/24(Mon)15:42:58 No.103541327

Anonymous 12/16/24(Mon)15:42:58 No.103541327

>>103541132
based. if every anon were like you, this thread would be a better place, even if all you did was make a commit fixing a typo.

Anonymous
12/16/24(Mon)15:43:45 No.103541335

Anonymous 12/16/24(Mon)15:43:45 No.103541335

>>103539403
notice how you (a nigger shill) couldnt respond to the truth >>103539436

>>103539731
бaзиpaн

Anonymous
12/16/24(Mon)15:44:31 No.103541342

Anonymous 12/16/24(Mon)15:44:31 No.103541342

>>103541324
If they release it for like under $800 we are so back

Anonymous
12/16/24(Mon)15:44:47 No.103541345

Anonymous 12/16/24(Mon)15:44:47 No.103541345

>>103541324
Ohh, shinyyy

Anonymous
12/16/24(Mon)15:45:40 No.103541352

Anonymous 12/16/24(Mon)15:45:40 No.103541352

>>103541324
>though not for gaming
I am worried that it will be released with an unhinged prosumer price (not as unhinged as nvidia but still unhinged probably)

Anonymous
12/16/24(Mon)15:45:51 No.103541354

Anonymous 12/16/24(Mon)15:45:51 No.103541354

>>103541324
https://web.archive.org/web/20241216204028/https://www.pcgamer.com/hardware/graphics-cards/shipping-document-suggests-that-a-24-gb-version-of-intels-arc-b580-graphics-card-could-be-heading-to-market-though-not-for-gaming/

Anonymous
12/16/24(Mon)15:46:38 No.103541361

Anonymous 12/16/24(Mon)15:46:38 No.103541361

File: file.png (82 KB, 242x218)

82 KB PNG

>>103541324
papa's parting gift to nvidia?

Anonymous
12/16/24(Mon)15:47:24 No.103541370

Anonymous 12/16/24(Mon)15:47:24 No.103541370

File: 1720030134404974.png (87 KB, 581x348)

87 KB PNG

>>103541354
https://x.com/GawroskiT/status/1867887152295784955

Anonymous
12/16/24(Mon)15:47:29 No.103541371

Anonymous 12/16/24(Mon)15:47:29 No.103541371

>>103541324
It's going to be priced out of relevance probably.

Anonymous
12/16/24(Mon)15:49:39 No.103541383

Anonymous 12/16/24(Mon)15:49:39 No.103541383

>>103541371
They gave us 12GB for $250, just imagine...
I really hope they do it, would force nvidia to start competitively pricing again.

Anonymous
12/16/24(Mon)15:51:20 No.103541395

Anonymous 12/16/24(Mon)15:51:20 No.103541395

>>103541370
>>103541383
Do we not have any legit hardware engineer fags on this general? Someone throw together a realistic BOM for making a 128GB card running at 2TB/s with enough ARM cores to keep up with CPU inference. How hard could it be?

Anonymous
12/16/24(Mon)15:54:37 No.103541414

Anonymous 12/16/24(Mon)15:54:37 No.103541414

>>103541395
you dont need to be a hwe to know you need to print a new pcb to have what you want

at best you can resolder bigger GB memory chips onto only a few gpus that can handle that higher GB memory, along with flashing another bios, a chink on ebay or something was doing that some time ago

Anonymous
12/16/24(Mon)15:56:01 No.103541424

Anonymous 12/16/24(Mon)15:56:01 No.103541424

>>103541013
>>103541031
Good question. I was always under the impression that 8bit cache increases perplexity, but I honestly have no idea. Would like to know as well.

Anonymous
12/16/24(Mon)15:59:06 No.103541455

Anonymous 12/16/24(Mon)15:59:06 No.103541455

>>103541324
I bought a 3090 for $300 a month ago. You can't compete with 2nd hand 3090s

Anonymous
12/16/24(Mon)16:00:12 No.103541463

Anonymous 12/16/24(Mon)16:00:12 No.103541463

File: 1731626560619193.png (26 KB, 922x250)

26 KB PNG

>>103541013
Can "Quantized KV Cache" on koboldcpp be used on only CPU with GPU context processing?

flashattention is needed for it to work and it seems like flashattention is only available on CUDA/CuBLAS...

Anonymous
12/16/24(Mon)16:00:33 No.103541473

Anonymous 12/16/24(Mon)16:00:33 No.103541473

>As Michael Jackson's sultry voice crooned in the background, Daisy found herself echoing his lyrics without thought, lost in the haze of sensory overload. "Billie Jean is not my lover," she sang softly, her breath mingling with Anon's, "She's just a girl who claims that I am the one…"
lol what

Anonymous
12/16/24(Mon)16:00:43 No.103541474

Anonymous 12/16/24(Mon)16:00:43 No.103541474

>>103541414
hardware hackers used to design and fab their own shit. is that out of the question these days?
my retard brain says you could marry a bunch of cots shit onto a bigass pcie board and beat the shit out of the big players.
just like 24 ddr5-8800 slots and some arm chips with sve vector units and enough hardware glue/fw to make it all interface with the host.
write some llama-cpp support in and eat good for relatively cheap.

Anonymous
12/16/24(Mon)16:01:13 No.103541482

Anonymous 12/16/24(Mon)16:01:13 No.103541482

>>103541455
where do you live? its nowhere near that online

Anonymous
12/16/24(Mon)16:01:51 No.103541489

Anonymous 12/16/24(Mon)16:01:51 No.103541489

>>103541395
Not an engineer but:
Buying (G)DDR some number memory modules and whatever arm cores you want are easy.
Designing and manufacturing a complex board that has all these memory modules soldered and traced to CPU, building a NUMA system and physically tracing the paths so that these cores can communicate with each other, and a complex memory controller that handles all of this communication are completely out of the realm of anyone here's garage.

Anonymous
12/16/24(Mon)16:04:11 No.103541507

Anonymous 12/16/24(Mon)16:04:11 No.103541507

>>103541474
Nope, Njudea limits vram capacity in firmware, they also flash e-fuses on the dies to shit them up for segmentation sake

Anonymous
12/16/24(Mon)16:04:47 No.103541514

Anonymous 12/16/24(Mon)16:04:47 No.103541514

>>103541473
kino
did it make up the whole michael jackson playing in the background thing out of nowhere?

Anonymous
12/16/24(Mon)16:07:22 No.103541536

Anonymous 12/16/24(Mon)16:07:22 No.103541536

anyone running debian unstable or another distro with 6.12 kernels? Any improvements or pitfalls for llm stuff?

Anonymous
12/16/24(Mon)16:07:35 No.103541542

Anonymous 12/16/24(Mon)16:07:35 No.103541542

>>103541463
I dunno if it takes effect but I am able to load flash attention + 8bit/Q4 cache on the CPU.
Do you get an error or something?

Anonymous
12/16/24(Mon)16:08:44 No.103541553

Anonymous 12/16/24(Mon)16:08:44 No.103541553

>>103541536
6.12 offers some big improvements for epyc I think, so unless you have that there's no reason to upgrade.

Anonymous
12/16/24(Mon)16:09:14 No.103541560

Anonymous 12/16/24(Mon)16:09:14 No.103541560

File: 1663182175279.webm (2.6 MB, 480x364)

2.6 MB WEBM

>>103541473
Based MJ enjoyer.

Anonymous
12/16/24(Mon)16:13:19 No.103541593

Anonymous 12/16/24(Mon)16:13:19 No.103541593

Phi4 feels like a naive and innocent girl ready to do anything to please you.

Anonymous
12/16/24(Mon)16:19:34 No.103541648

Anonymous 12/16/24(Mon)16:19:34 No.103541648

>>103541482
Western Europe. It's not on ebay but on "refurbished" websites. For some reason the resellers of refurbishment sites are retarded as fuck and don't price it to market. I've scooped up 3 RTX3090s for between $300-$400 this way.

Anonymous
12/16/24(Mon)16:21:18 No.103541665

Anonymous 12/16/24(Mon)16:21:18 No.103541665

>>103540321
why is it such a meme to hate on ollama, are y'all just mad that it makes local ai more accessible so you're not special anymore?

Anonymous
12/16/24(Mon)16:24:33 No.103541688

Anonymous 12/16/24(Mon)16:24:33 No.103541688

>>103541424
It doesn't seem to affect much, wtf?
https://github.com/turboderp/exllamav2/blob/master/doc/qcache_eval.md
Though this is for exllama, I think it's exceptional in that 4 performs better than 8.

Anonymous
12/16/24(Mon)16:25:55 No.103541700

Anonymous 12/16/24(Mon)16:25:55 No.103541700

>>103541474
oh i am laffin

you know at the memory bandwidths that we're talking about here you need to care about signal delays caused by the speed of light right? the level of tryhard you have to be on to get just to have decent signal integrity is absolutely out of reach for hobbyists

Anonymous
12/16/24(Mon)16:27:17 No.103541714

Anonymous 12/16/24(Mon)16:27:17 No.103541714

https://files.catbox.moe/k3lss1.json
made a link to share my overwrought schizo placebo context / instruct preset for llama 3.3 (mostly eva (v0.0)) across devices and figured I might as well dump it for the thread too
for samplers I use temp=1.2 minp=0.135, yes I know that's a high minp, trust the plan

Anonymous
12/16/24(Mon)16:29:25 No.103541736

Anonymous 12/16/24(Mon)16:29:25 No.103541736

>>103541665
ask cuda dev he had multiple melties about them before
>>100110227
>On the other hand, if you look at the issues on the ollama Github I think bug reports by those users would be a net negative for llama.cpp.
>>100209871
>And if you look at the ollama Github issues you get the impression that its userbase consists of absolute retards, so no real loss there.
>>100393063
>I don't bother posting about ollama on /lmg/ but I definitely have a more favorable opinion of koboldcpp since its devs actually provide benefits to the upstream project.
then posts this
>>101207663
>I wouldn't recommend koboldcpp.

L3.3fag !!SB6Q3O4XU7f
12/16/24(Mon)16:29:54 No.103541742

L3.3fag !!SB6Q3O4XU7f 12/16/24(Mon)16:29:54 No.103541742

>>103541714
Haha, experimenting with you-prompting as well?

Anonymous
12/16/24(Mon)16:31:26 No.103541751

Anonymous 12/16/24(Mon)16:31:26 No.103541751

>>103541714
sloppa

Anonymous
12/16/24(Mon)16:33:10 No.103541768

Anonymous 12/16/24(Mon)16:33:10 No.103541768

>>103541742
I've always prompted that way tbdesu
>>103541751
be nice

Anonymous
12/16/24(Mon)16:35:02 No.103541780

Anonymous 12/16/24(Mon)16:35:02 No.103541780

>>103541768
preddit seems more your speed, sis

Anonymous
12/16/24(Mon)16:35:36 No.103541788

Anonymous 12/16/24(Mon)16:35:36 No.103541788

File: looooong looooooong maaaa(...).jpg (140 KB, 1920x1080)

140 KB JPG

Are there any good vision models that can read japanese? I'm willing to buy extra GPUs for this.

Anonymous
12/16/24(Mon)16:37:00 No.103541797

Anonymous 12/16/24(Mon)16:37:00 No.103541797

>>103541788
I can read Japanese. Pay me.

Anonymous
12/16/24(Mon)16:37:47 No.103541806

Anonymous 12/16/24(Mon)16:37:47 No.103541806

>>103541665
I do not lurk the thread, but I started with ollama for this project >>103538279
It would not work with the top embedding model on MTEB that fit on my GPU, that is, stella. It also did not have a tokenizing endpoint. I figured those things out in the wrong order (I patched ollama to provide the tokenizing endpoint, then I patched the ollama-python library to access the tokenizing endpoint), before I figured out that the trust_remote_code=True part of the sentence-transformers example actually did something meaningful, and I had to throw away all of that work. I switched to
substratusai/stapi, which also did not have a tokenizing endpoint, but after adding it, it worked.

In conclusion, ollama seems to be dumb. I wonder what the few non-coomers in the thread would do if you wanted a) local and also b) do all inference tasks over an API

Anonymous
12/16/24(Mon)16:37:57 No.103541808

Anonymous 12/16/24(Mon)16:37:57 No.103541808

File: 1717446855440379.png (329 KB, 450x408)

329 KB PNG

>>103541665
Are you just that retarded?

Anonymous
12/16/24(Mon)16:39:12 No.103541817

Anonymous 12/16/24(Mon)16:39:12 No.103541817

>>103541806
YOU WOULD USE LLAMA.CPP SERVER OR VLLM. Sweet Jesus yall are fucking something

Anonymous
12/16/24(Mon)16:40:15 No.103541831

Anonymous 12/16/24(Mon)16:40:15 No.103541831

>>103541736
lol ok so it's just a schitzo guy with loud programming opinions doing his thing, got it. ofc a more popular project that's easier to use is going to get worse bug reports
fwiw on my machine ollama consistently beats llama.cpp on inference speed by about 15%, i'm sure that's user error, but in the real world what matters is how much i get out of it for how much effort i put in, i'm sure i could sit around and track down why llama.cpp is slower but ollama just works and i have better things to do than troubleshoot

Anonymous
12/16/24(Mon)16:42:25 No.103541856

Anonymous 12/16/24(Mon)16:42:25 No.103541856

File: 4chan-the-cancer-takes-hold.png (232 KB, 647x686)

232 KB PNG

>>103541817
>>103541806
OR EVEN BETTER, LLAMAFILE SO YOU HAVE A SINGLE BINARY WITH NO INSTALL THAT WORKS ACROSS PLATFORMS/OS's

t. someone who has built stuff with support for ollama/llama.cpp/llamafile/transformers.

Ollama wants to be special and tried forcing usage of their own API which fucking sucks, and their documentation also fucking sucks.
I say this, having reviewed the API docs for all the large providers. Ollama was near the fucking worse.
Fuck ollama.

Anonymous
12/16/24(Mon)16:44:42 No.103541884

Anonymous 12/16/24(Mon)16:44:42 No.103541884

>>103541736
>its devs actually provide benefits to the upstream project.
This is the real problem with every other project. do they acknowledge and give back? This is regardless of whether they have a legal requirement to or not.
If yes, they will be well liked. if not, they're going to be on a lot of shit lists.

Anonymous
12/16/24(Mon)16:44:46 No.103541885

Anonymous 12/16/24(Mon)16:44:46 No.103541885

>>103541856
>LLAMAFILE SO YOU HAVE A SINGLE BINARY WITH NO INSTALL THAT WORKS
who cares when you're gonna be launching it with a single shortcut anyway, jartroon?

and koboldcpp has le 1 exe while actually having features before lama.cpp itself most of the time

Anonymous
12/16/24(Mon)16:45:29 No.103541895

Anonymous 12/16/24(Mon)16:45:29 No.103541895

24GB vramlet reporting in. The EVA based on Qwen 32B seems a lot better than the 70B 3.33 at IQ2, which is the level of compression needed to fit. It's not terrible but you can tell it's brain damaged and falls into repetition easily. Which is "no shit" since it's Q2, but posters keep claiming that Q2 70B > Q4 32B, so congrats to them, they succeeded in wasting my time downloading that shit.

Can't be too mad though because the 32B Qwen version is still really good. And I'm saying that as someone who had no success with Qwen in the past ever

Anonymous
12/16/24(Mon)16:46:48 No.103541918

Anonymous 12/16/24(Mon)16:46:48 No.103541918

File: pepe question.png (213 KB, 716x641)

213 KB PNG

How to get llamacpp_HF working? Getting "ERROR Could not load the model because a tokenizer in Transformers format was not found."
And I already have oobabooga_llama-tokenizer

Anonymous
12/16/24(Mon)16:46:55 No.103541919

Anonymous 12/16/24(Mon)16:46:55 No.103541919

>>103541856
At least there was growth back then, 4chan has been slowly dying off for years now.

Anonymous
12/16/24(Mon)16:47:18 No.103541921

Anonymous 12/16/24(Mon)16:47:18 No.103541921

>>103541918
>oobabooga
>>103541808

Anonymous
12/16/24(Mon)16:49:11 No.103541937

Anonymous 12/16/24(Mon)16:49:11 No.103541937

How did you guys get ollama to work? I downloaded it and tried to run it but it doesn't work.

Anonymous
12/16/24(Mon)16:49:12 No.103541938

Anonymous 12/16/24(Mon)16:49:12 No.103541938

>>103541817
>VLLM
I like it, but it sucks that it never showed up in google searches for self-hosted HTTP API that can serve sentence-transformers. Anyway, this is what happens when you just start building something without trying to get bogged down in the details of what's optimal.

One of the issues I had with stella, though, is that it has an embedding quirk in that the embedding length is always equal to the maximum context (it will trim or pad). It looks like this provides the tokenize API directly, which is great.

Is this, https://docs.vllm.ai/en/latest/models/adding_model.html the page for getting it to work with >>103538297?

Anonymous
12/16/24(Mon)16:51:03 No.103541957

Anonymous 12/16/24(Mon)16:51:03 No.103541957

>>103541856
lol what, i wrote a rag tool with ollama without using any libraries and i found their api to be very easy to work with and had a wrapper built for it in like 2 hours and i haven't had any issues at all with it since

Anonymous
12/16/24(Mon)16:52:08 No.103541976

Anonymous 12/16/24(Mon)16:52:08 No.103541976

>>103541736
>>I wouldn't recommend koboldcpp.
I can understand why he doesn't recommend it. Kobo is simply slower. In the past it was like 5% slower, but now with their fucked up speculative decoding 30%. 30%! That's one third. All they had to do is to copy over the settings from llama.cpp, but no, it would be too difficult for our retard users who probably don't know what speculative decoding is anyway.

Anonymous
12/16/24(Mon)16:52:51 No.103541982

Anonymous 12/16/24(Mon)16:52:51 No.103541982

>>103541957
>a rag tool with ollama
Welcome to LLM 2.0 sir.

L3.3fag !!SB6Q3O4XU7f
12/16/24(Mon)16:58:07 No.103542027

L3.3fag !!SB6Q3O4XU7f 12/16/24(Mon)16:58:07 No.103542027

>>103541895
One (1) single solitary retard was telling you that Q2 is worthwhile, the rest of us kept telling you that Q4 is the lowest you should go if you don't want severe drain bamage.

Anonymous
12/16/24(Mon)16:58:13 No.103542029

Anonymous 12/16/24(Mon)16:58:13 No.103542029

>>103541665
>why is it such a meme to hate on ollama
They actively refused to acknowledge or credit llama.cpp. It wasn't until a few months ago after pushback did they even put any mention llama.cpp in the repo page, and even now it's just one line under "Supported backends".
Built a wrapper around llama.cpp, made it look like they were the ones innovating when they were building on top of others' work, then they got big. Now reddit and the other normies worship them as heroes of open source AI and jerk them off

Anonymous
12/16/24(Mon)16:58:57 No.103542034

Anonymous 12/16/24(Mon)16:58:57 No.103542034

>>103541953
https://docs.vllm.ai/en/latest/getting_started/examples/gguf_inference.html

Anonymous
12/16/24(Mon)16:59:42 No.103542043

Anonymous 12/16/24(Mon)16:59:42 No.103542043

>>103542027
Sounds more like you're mad that the model you're aggressively shilling got exposed as shit by someone who can actually run it.

L3.3fag !!SB6Q3O4XU7f
12/16/24(Mon)17:03:22 No.103542071

L3.3fag !!SB6Q3O4XU7f 12/16/24(Mon)17:03:22 No.103542071

>>103542043
>aggressively shilling
Bro, this is /lmg/. I'm talking about a model I like to/with others who also like it, i.e. using the thread for its intended purpose. What are you doing here, other than having a meltie?

Anonymous
12/16/24(Mon)17:04:31 No.103542082

Anonymous 12/16/24(Mon)17:04:31 No.103542082

>>103542043
Except that if you go back and look at the previous threads, you'll see that he does infact argue against q2.

llama.cpp CUDA dev !!OM2Fp6Fn93S
12/16/24(Mon)17:05:08 No.103542088

llama.cpp CUDA dev !!OM2Fp6Fn93S 12/16/24(Mon)17:05:08 No.103542088

>>103541976
That wasn't actually me who posted about koboldcpp.
I was for the longest time using an insecure tripcode so the Petra/blacked Miku/fake ggerganov spammer cracked it.

From the perspective of contributions back to llama.cpp I think koboldcpp is the best downstream project (particularly because I count 0cc4m as one of the koboldcpp devs), llamafile is second place, all other projects basically don't matter.

Anonymous
12/16/24(Mon)17:05:45 No.103542091

Anonymous 12/16/24(Mon)17:05:45 No.103542091

llama.cpp is another example of what happens if you select cuck license. Corpos get to use it for free and don't have to contribute back. A fucking fork get all the funding and you get cucked.

Anonymous
12/16/24(Mon)17:08:39 No.103542112

Anonymous 12/16/24(Mon)17:08:39 No.103542112

>>103542088
Will llama.cpp get anti-slop/string ban sampler? Or am I stuck having to pick between slop and slow speed?

Anonymous
12/16/24(Mon)17:09:48 No.103542127

Anonymous 12/16/24(Mon)17:09:48 No.103542127

>>103542034
Thanks, this seems close to what I was looking for. I don't find anywhere in https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html that you can command VLLM to download the model for you via the API. That is fine, but I'm just making sure that I'm reading it right.

Anonymous
12/16/24(Mon)17:11:06 No.103542144

Anonymous 12/16/24(Mon)17:11:06 No.103542144

>>103542091
Could it be that ggerganov is petr*? He is a confirmed cuck, after all.

Anonymous
12/16/24(Mon)17:12:12 No.103542160

Anonymous 12/16/24(Mon)17:12:12 No.103542160

>>103542144
That's the million-dollar question.

Anonymous
12/16/24(Mon)17:12:15 No.103542161

Anonymous 12/16/24(Mon)17:12:15 No.103542161

>>103542082
>>103542027
samefag

Anonymous
12/16/24(Mon)17:12:17 No.103542163

Anonymous 12/16/24(Mon)17:12:17 No.103542163

why is gguf such a slow piece of shit format

Anonymous
12/16/24(Mon)17:12:27 No.103542165

Anonymous 12/16/24(Mon)17:12:27 No.103542165

>>103542027
Obviously Q4 is better than Q2, that's not interesting. The question is Q2 70B vs Q4 32B. That one isn't obvious and I could make up reasons why it should go either way. However, my conclusion is that Q2 is a meme. Shocker I know but maybe I save someone else some time

>inb4 "just run 70B Q4 at 0.5T/s"
no

Anonymous
12/16/24(Mon)17:13:24 No.103542184

Anonymous 12/16/24(Mon)17:13:24 No.103542184

>>103542088
>implying he isn't the blacked Miku poster

L3.3fag !!SB6Q3O4XU7f
12/16/24(Mon)17:17:31 No.103542221

L3.3fag !!SB6Q3O4XU7f 12/16/24(Mon)17:17:31 No.103542221

>>103542165
Q2 very much IS a meme when it comes to creative writing, regardless of model size, yeah. That's why I kept saying that anything below Q4 isn't worth it. And sure enough, if you can't run a 70B model, the Qwen-based 32B-s are the next best choice. If you haven't tried it yet, I recommend Evathene; I liked it more than Eva itself. That slight Athene flavor adds something to it IMO.

Anonymous
12/16/24(Mon)17:17:35 No.103542222

Anonymous 12/16/24(Mon)17:17:35 No.103542222

>>103542184

>>103415136
>>103415171
Nah, he's based.

Anonymous
12/16/24(Mon)17:17:41 No.103542223

Anonymous 12/16/24(Mon)17:17:41 No.103542223

>>103542163
Faster than EXL2 nowadays if you use updated llama.cpp with speculative decoding.

Anonymous
12/16/24(Mon)17:19:43 No.103542237

Anonymous 12/16/24(Mon)17:19:43 No.103542237

>>103542221
why are you recommending a 72B model to someone that can't run a 70B?

L3.3fag !!SB6Q3O4XU7f
12/16/24(Mon)17:19:53 No.103542240

L3.3fag !!SB6Q3O4XU7f 12/16/24(Mon)17:19:53 No.103542240

>>103542221
...disregard that, I'm a retard, Evathene is also in the 70B ballpark.

llama.cpp CUDA dev !!OM2Fp6Fn93S
12/16/24(Mon)17:20:15 No.103542244

llama.cpp CUDA dev !!OM2Fp6Fn93S 12/16/24(Mon)17:20:15 No.103542244

>>103541976
>>103542088
>llamafile is second place, all other projects basically don't matter.
Actually, I need to correct myself: GPT4All has also made extensive upstream contributions so I would put them in second place and llamafile in third place.

>>103542091
If I had started the project I would have made it (A)GPL but I don't plan to turn this project into a career.
In 2024 I made six figures from llama.cpp-related part time work and this almost certainly would not have happened without a permissive license.
So I can see the appeal.

>>103542112
I don't know the status of this specific sampler in llama.cpp.
More generally my stance towards samplers is that I would like to see objective evidence for their effectiveness and that therefore better methods for evaluating them are needed.

>>103542184
When I troll a thread it looks more like this:
https://desuarchive.org/a/thread/168206398/

L3.3fag !!SB6Q3O4XU7f
12/16/24(Mon)17:21:06 No.103542254

L3.3fag !!SB6Q3O4XU7f 12/16/24(Mon)17:21:06 No.103542254

>>103542237
Being a moron and forgetting which 32B I used to use, mainly.

Anonymous
12/16/24(Mon)17:21:12 No.103542255

Anonymous 12/16/24(Mon)17:21:12 No.103542255

>>103542223
>Faster than EXL2 nowadays if you use updated llama.cpp with speculative decoding.
Prompt processing takes twice as long and inference is basically the same (slower when you use spec decoding in tabby/exl2 as well), what are you talking about

Anonymous
12/16/24(Mon)17:22:05 No.103542262

Anonymous 12/16/24(Mon)17:22:05 No.103542262

>>103542255
It's faster as long as all layers are on GPU

Anonymous
12/16/24(Mon)17:28:15 No.103542323

Anonymous 12/16/24(Mon)17:28:15 No.103542323

>>103542223
every time someone posts this i fall for it, spend an hour building llamacpp and downloading a Q4_K_M model to compare against exllama 5bpw, and it's still slow as fuck
not today

Anonymous
12/16/24(Mon)17:31:49 No.103542362

Anonymous 12/16/24(Mon)17:31:49 No.103542362

>>103542323
Use speculative decoding on Qwen 0.5B for the small model. Make sure to put in the flags correctly so that you have same context length for both models.

Look into this github thread to see the exact flags you should enable to make it as fast as possible: https://github.com/ggerganov/llama.cpp/pull/10455

Faster than exl2

Anonymous
12/16/24(Mon)17:33:35 No.103542384

Anonymous 12/16/24(Mon)17:33:35 No.103542384

>>103537827
Generate terabytes of magical girl hentai daily

Anonymous
12/16/24(Mon)17:33:37 No.103542386

Anonymous 12/16/24(Mon)17:33:37 No.103542386

>>103542244
>I don't know the status of this specific sampler in llama.cpp.
>More generally my stance towards samplers is that I would like to see objective evidence for their effectiveness and that therefore better methods for evaluating them are needed.
It's not an actual sampler like MinP, XTC or temperature, it's just a function to ban strings instead of tokens.
https://github.com/sam-paech/antislop-sampler video here shows what it does and as you can see it's effective at it's job.

Anonymous
12/16/24(Mon)17:34:39 No.103542396

Anonymous 12/16/24(Mon)17:34:39 No.103542396

>>103541918
So I figured that I probably need this file: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py
While installing requirements for it some piece of shit called jax-lib ate a lot of my internet quota and it was still going so I had to kill it.
So yeah that was that, I am sticking to non HF.

Anonymous
12/16/24(Mon)17:36:24 No.103542410

Anonymous 12/16/24(Mon)17:36:24 No.103542410

Is there any smaller Mistral model that works as a draft model for Largestral?

Anonymous
12/16/24(Mon)17:36:25 No.103542411

Anonymous 12/16/24(Mon)17:36:25 No.103542411

>>103542396
how are you even downloading models if a couple gigs of python wheels are fucking your internet quota

Anonymous
12/16/24(Mon)17:37:20 No.103542422

Anonymous 12/16/24(Mon)17:37:20 No.103542422

>>103542410
mistral 7b instruct 0.3 should be the same tokenizer

Anonymous
12/16/24(Mon)17:38:10 No.103542430

Anonymous 12/16/24(Mon)17:38:10 No.103542430

>>103542411
I go to public wifi with my old laptop to grab models there.
Then I move models to my desktop which has limited internet.

Anonymous
12/16/24(Mon)17:38:33 No.103542435

Anonymous 12/16/24(Mon)17:38:33 No.103542435

>>103542422
actually this might only apply to largestral 2407, not sure about largestral 2411 which has the new [SYSTEM_PROMPT] tokens

Anonymous
12/16/24(Mon)17:39:03 No.103542439

Anonymous 12/16/24(Mon)17:39:03 No.103542439

>>103542430
you sound poor, what setup are you running?

Anonymous
12/16/24(Mon)17:40:09 No.103542451

Anonymous 12/16/24(Mon)17:40:09 No.103542451

>>103542435
just replace the tokenizer.json of the 7b with the one from the new large
it might still work maybe

Anonymous
12/16/24(Mon)17:40:10 No.103542452

Anonymous 12/16/24(Mon)17:40:10 No.103542452

>>103542386
>as you can see it's effective at it's job.
did you actually measure things to the 0.00000 percent tho? how can you claim it works without a 1500 page long arxiv paper at the very least?!

Anonymous
12/16/24(Mon)17:43:18 No.103542476

Anonymous 12/16/24(Mon)17:43:18 No.103542476

>>103542439
I run models on my second hand 3060.
Why I don't have proper internet now, is a bit complicated.

Anonymous
12/16/24(Mon)17:47:52 No.103542513

Anonymous 12/16/24(Mon)17:47:52 No.103542513

File: 4740 - SoyBooru.png (413 KB, 722x1199)

413 KB PNG

>did you actually measure things to the 0.00000 percent tho? how can you claim it works without a 1500 page long arxiv paper at the very least?!

Anonymous
12/16/24(Mon)17:48:13 No.103542518

Anonymous 12/16/24(Mon)17:48:13 No.103542518

File: Screenshot from 2024-12-1(...).png (13 KB, 623x447)

13 KB PNG

>>103541918
Put the .json files in a directory with the GGUF, there is "_HF creator" tab in ooba not sure exactly what that does might download more shit you don't need. oobabooga_llama-tokenizer only works for some models, you'll want the right tokenizer for the model architecture, look in the original HF repo or search "modelname tokenizer"

Anonymous
12/16/24(Mon)17:49:38 No.103542527

Anonymous 12/16/24(Mon)17:49:38 No.103542527

>>103542513
exactly the feel i was going for, than you.

Anonymous
12/16/24(Mon)17:50:08 No.103542534

Anonymous 12/16/24(Mon)17:50:08 No.103542534

>>103536775
>picture is miku climbing out of the computer screen to become "real"
>but the room she's climbing into is also anime, implying she's merely leaving one layer of anime world and entering another layer
hmm

Anonymous
12/16/24(Mon)17:57:10 No.103542589

Anonymous 12/16/24(Mon)17:57:10 No.103542589

>>103542534
It's miku stepping INTO the screen to join with her LLM waifu

Anonymous
12/16/24(Mon)18:04:03 No.103542627

Anonymous 12/16/24(Mon)18:04:03 No.103542627

How do yo utilize 70B models?
I could easily upgrade to 64gb ram,
You can't utilize your GPU if it doesn't fit in VRAM can you?
Sounds like it would be painfully slow pre-processing context and create tokens.

Anonymous
12/16/24(Mon)18:04:28 No.103542630

Anonymous 12/16/24(Mon)18:04:28 No.103542630

>>103542589
...can I come too?

Anonymous
12/16/24(Mon)18:07:15 No.103542652

Anonymous 12/16/24(Mon)18:07:15 No.103542652

>>103542384
>magical girl hentai
proofs

Anonymous
12/16/24(Mon)18:07:43 No.103542657

Anonymous 12/16/24(Mon)18:07:43 No.103542657

>>103542627
>You can't utilize your GPU if it doesn't fit in VRAM can you?
...
llama.cpp cries in the corner

Anonymous
12/16/24(Mon)18:08:56 No.103542665

Anonymous 12/16/24(Mon)18:08:56 No.103542665

>>103542627
It is. You get ~0.5t/s, maybe less.
t. knower, sufferer

Anonymous
12/16/24(Mon)18:09:35 No.103542671

Anonymous 12/16/24(Mon)18:09:35 No.103542671

>>103542589
She is receiving Mikusex from Anon by passing part of herself through the barrier.
This is what is happening on the other side - the side we cannot see.

Anonymous
12/16/24(Mon)18:10:41 No.103542679

Anonymous 12/16/24(Mon)18:10:41 No.103542679

Takes weeks or months to get the new shiny features other software implements asap
The preset system is also more convoluted than simply editing your txt config. I could also without issues run some models on koboldcpp that instead ran out of memory in LM studio.
You are better off learning to use a better frontend. It isn't that hard once you get used to it.

Anonymous
12/16/24(Mon)18:12:12 No.103542695

Anonymous 12/16/24(Mon)18:12:12 No.103542695

>>103542384
Based magi-gal enjoyer.

Anonymous
12/16/24(Mon)18:12:24 No.103542700

Anonymous 12/16/24(Mon)18:12:24 No.103542700

>>103542518
Thanks the model is acting weird, but it loads this way.
There are more new settings like xtc and dry that I haven't seen in base llama.cpp, I guess they might be fucking with it?

Anonymous
12/16/24(Mon)18:13:41 No.103542705

Anonymous 12/16/24(Mon)18:13:41 No.103542705

>>103542695
>Based magi-gal enjoyer.
Thanks for saying what we were all thinking. Man of culture, right there

Anonymous
12/16/24(Mon)18:15:01 No.103542716

Anonymous 12/16/24(Mon)18:15:01 No.103542716

>>103542700
>xtc and dry that I haven't seen in base llama.cpp
> XTC sampler has been merged into llama.cpp mainline
>2 mo. ago
https://www.reddit.com/r/LocalLLaMA/comments/1g5a3bs/xtc_sampler_has_been_merged_into_llamacpp_mainline/
> DRY sampler was just merged into llama.cpp mainline
>2 mo. ago
>https://www.reddit.com/r/LocalLLaMA/comments/1gby1uk/dry_sampler_was_just_merged_into_llamacpp_mainline/
have you considered that maybe that's just ooba

Anonymous
12/16/24(Mon)18:16:55 No.103542728

Anonymous 12/16/24(Mon)18:16:55 No.103542728

>>103542716
Ok I am assuming that you are gonna recommend a replacement?

Anonymous
12/16/24(Mon)18:18:35 No.103542745

Anonymous 12/16/24(Mon)18:18:35 No.103542745

>>103542728
llama.cpp mainline?

Anonymous
12/16/24(Mon)18:20:58 No.103542770

Anonymous 12/16/24(Mon)18:20:58 No.103542770

>>103542716
was it really easier for you to find and link reddit threads than to search the llama.cpp repo?

Anonymous
12/16/24(Mon)18:22:28 No.103542784

Anonymous 12/16/24(Mon)18:22:28 No.103542784

>>103542770
yes

Anonymous
12/16/24(Mon)18:28:40 No.103542854

Anonymous 12/16/24(Mon)18:28:40 No.103542854

File: 1721327000126830.jpg (66 KB, 692x349)

66 KB JPG

>>103535692
got it to add the button to the menu and show up on the left of the chat, so now it can stay up. ui's messed a bit, but its an ok start

Anonymous
12/16/24(Mon)18:29:50 No.103542866

Anonymous 12/16/24(Mon)18:29:50 No.103542866

jesus fucking christ i hate python devs so much, spent the last couple hours trying to get fish-speech to work, shit fucking core dumps on inference
fishspeech.cpp when?
probably just gonna stick to xtts which works well like 95% of the time, anyone have tips on how to get it not to trail off and make weird noises sometimes?

Anonymous
12/16/24(Mon)18:33:27 No.103542897

Anonymous 12/16/24(Mon)18:33:27 No.103542897

>>103542665
man I really like to try llama 3.3 locally.
Trying it with groq, it already feels like a big step ahead of something like nemo or magnum.
But 6k context limit and token/time limits makes it subpar and I tend to only use it escape slop looping.
Btw. is there an option in ST to link system prompt settings to a connection profile?

Anonymous
12/16/24(Mon)18:33:37 No.103542903

Anonymous 12/16/24(Mon)18:33:37 No.103542903

>>103539820
>>103540053
I tried using your settings but it starts off with the same openings every time. The amount of times I've seen "At your...x" "Rolling onto her side" etc. Its very repetitive

Anonymous
12/16/24(Mon)18:35:12 No.103542919

Anonymous 12/16/24(Mon)18:35:12 No.103542919

>>103542866
>jesus fucking christ i hate python devs so much
Worse part of this hobby by far is that trying anything new involves wasting hours of type fucking around with Python dependencies and half the time it still doesn't work
>probably just gonna stick to xtts which works well like 95% of the time
Samsies
>anyone have tips on how to get it not to trail off and make weird noises sometimes?
Have you updated recently? When I went back to xtts, I installed the latest version from pip and that seems to have been enough to eliminate most of the trailing demonic noises. Also, make sure you have a high quality sample voice

Anonymous
12/16/24(Mon)18:38:39 No.103542949

Anonymous 12/16/24(Mon)18:38:39 No.103542949

>>103542866
>>103542919
Funfact creator of Python was actually my neighbor for years and I talked to him sometimes. He essentially has loathed what the language has become and has distanced himself from the language around the time covid hit. It was meant as a replacement for BASIC for kids to learn programming and make small scripts in. Not for it to be in production code.

Anonymous
12/16/24(Mon)18:40:49 No.103542972

Anonymous 12/16/24(Mon)18:40:49 No.103542972

>>103542919
>>103542866
>thing will be deprecated in a future version. Use thing-lgbt instead.
41% of the time the script never gets updated once the deprecation is finalized.

Anonymous
12/16/24(Mon)18:41:41 No.103542980

Anonymous 12/16/24(Mon)18:41:41 No.103542980

>>103542919
I just look for conda instructions, or look for a Dockerfile
It's an annoying waste of time and space to do for every different project but it saves a lot of frustration.

Anonymous
12/16/24(Mon)18:58:23 No.103543140

Anonymous 12/16/24(Mon)18:58:23 No.103543140

>>103542980
lol i'm using conda, i even installed wsl for this shit and the thing just segfaults, now i'm trying to use the fish-speech.rs thing but that doesn't like that my MSVC is mismatched to my NVCC, teaching sand to think was a mistake

L3.3fag !!SB6Q3O4XU7f
12/16/24(Mon)19:01:15 No.103543163

L3.3fag !!SB6Q3O4XU7f 12/16/24(Mon)19:01:15 No.103543163

>>103542903
I'm guessing you mean the low-temp config? I did mention in the original post that swipes start off almost identical, but diverge nicely if you give it a paragraph or two. It likes to echo your actions back to you before continuing from there, I guess.
I've realized since then that you can get less structured, more flavorful responses by completely zeroing Min-P and bumping temp up to 1.3-1.4. Give it a try, maybe it'll be more to your liking.

Anonymous
12/16/24(Mon)19:22:28 No.103543322

Anonymous 12/16/24(Mon)19:22:28 No.103543322

>>103543140
>i even installed wsl for this shit and the thing just segfaults
That might be due to WSL more than Python. Stop being a pussy and dual boot.

Anonymous
12/16/24(Mon)19:24:59 No.103543346

Anonymous 12/16/24(Mon)19:24:59 No.103543346

>>103543140
well i got it to compile and it "fails to generate semantic tokens" so that was a total bust, i wonder what i broke by bumping my cuda version kekw

>>103542919
what are you using for xtts, i was using a wrapper '
opendai-speech' it's got xtts2 in there and it seems fine, but i'll try whatever you're using to see if it's better

Anonymous
12/16/24(Mon)19:36:49 No.103543432

Anonymous 12/16/24(Mon)19:36:49 No.103543432

>>103539993
There's no fucking way you're this new.

Anonymous
12/16/24(Mon)19:38:51 No.103543447

Anonymous 12/16/24(Mon)19:38:51 No.103543447

>>103541788
>japanese
how about you check the previous thread...or the thread recap...or even the op?

Anonymous
12/16/24(Mon)19:45:19 No.103543510

Anonymous 12/16/24(Mon)19:45:19 No.103543510

>>103543346
>what are you using for xtts, i was using a wrapper '
I'm just using the TTS package directly @ 0.22.0, the last official release
Apparently there's a fork coqui-tts that goes up to 0.25.1
Check what version you have installed and try to update
>opendai-speech
I think that's using the fork
Either way, try to run it directly from the command line or a Python script to rule out the wrapper

Anonymous
12/16/24(Mon)19:46:45 No.103543520

Anonymous 12/16/24(Mon)19:46:45 No.103543520

>>103543322
dual booting is hell, i did that shit for years and it's so fucking bothersome, i'll keep my linuxes in VMs where they belong, i use arch probably like 50% of the time but it lives in a VM on a different desktop so i can win+tab between oses, doing anything with GPUs on linux is so annoying, plus i hate having to decide "oh i guess now it's gaming/CAD time, better reboot my whole computer"

anyway i got it to build natively and now it's doing ... something

Anonymous
12/16/24(Mon)19:49:28 No.103543544

Anonymous 12/16/24(Mon)19:49:28 No.103543544

>>103543520
just do your gayming in linux. you don't play anything with anticheat spyware, do you...?

Anonymous
12/16/24(Mon)19:52:42 No.103543571

Anonymous 12/16/24(Mon)19:52:42 No.103543571

>>103543544
playing older games / some pirated games is also a pain.

Anonymous
12/16/24(Mon)19:54:55 No.103543590

Anonymous 12/16/24(Mon)19:54:55 No.103543590

Openai btfo
https://x.com/Lawrli/status/1868555485580067133

Anonymous
12/16/24(Mon)19:58:38 No.103543614

Anonymous 12/16/24(Mon)19:58:38 No.103543614

Also Kijai made a distilled version of HunyuanVideo, good quality at low steps now. (The video fast one)
https://huggingface.co/Kijai/HunyuanVideo_comfy/tree/main

Anonymous
12/16/24(Mon)19:58:47 No.103543618

Anonymous 12/16/24(Mon)19:58:47 No.103543618

>>103543590
It's kinda wild how little of an advantage OpenAI actually has right now.
The open chink model is worse than Sora for a lot of stuff, but that doesn't fucking matter when one is good enough and can be finetuned for literally anything and the other is anti-foxgirl propaganda
This is about to be SD 1 vs DALL-E 2 all over again

Anonymous
12/16/24(Mon)19:59:49 No.103543629

Anonymous 12/16/24(Mon)19:59:49 No.103543629

>>103543544
naw i'm not a schitzo, i play games with anticheat spyware all the time because i have friends
>>103543510
thanks yeah it is using that fork and it's at the latest version so at least no fucking around trying to install it, it's already in the venv
looks like it works just as fine as the wrapped version, tho digging into the internals is interesting, i didn't realize that voice cloning is done by doing tts and then piping that into voice conversion, fascinating stuff, i'll try to clean up my source audios a bit and see if that helps with it trailing off sometimes, gotta go pick my girlfriend up now but i'll be back tonight and maybe give fish speech another try to see if i can do some actual comparisons, thanks for the help

Anonymous
12/16/24(Mon)20:14:31 No.103543742

Anonymous 12/16/24(Mon)20:14:31 No.103543742

File: 1734398060257.png (40 KB, 1080x325)

40 KB PNG

>>103543590
I hate Xitter, please post xcancel links only.

Anonymous
12/16/24(Mon)20:18:02 No.103543777

Anonymous 12/16/24(Mon)20:18:02 No.103543777

>>103543614
>12GB VRAM is the minimum
damn, 8GB vrambros... it's not our day...

Anonymous
12/16/24(Mon)20:19:23 No.103543786

Anonymous 12/16/24(Mon)20:19:23 No.103543786

>>103543742
go cry on bluecry instead

Anonymous
12/16/24(Mon)20:21:38 No.103543805

Anonymous 12/16/24(Mon)20:21:38 No.103543805

Can anyone point me to a finetuning guide in which I feed the model text and the model uses said text to shape up its responses/personality?
So far I've created tons of JSON files for each text and I've prepared datasets and shit but somehow the result is always a model that doesn't really change much from the pretrained model aside from looping introductions or prompts that are actually supposed to give it a personality in the responses.
I'm training Llama 3.1 8b btw

Anonymous
12/16/24(Mon)20:22:35 No.103543814

Anonymous 12/16/24(Mon)20:22:35 No.103543814

>>103543805
Post dataset

Anonymous
12/16/24(Mon)20:23:29 No.103543818

Anonymous 12/16/24(Mon)20:23:29 No.103543818

My leenux PC just crashed because I forgot I had an LLM loaded up and tried to run another VRAM heavy thing (a 4k video kek). Is there any way to make it so it just crashes the program instead of the entire PC? I would guess no, but might as well ask.

Anonymous
12/16/24(Mon)20:23:36 No.103543820

Anonymous 12/16/24(Mon)20:23:36 No.103543820

>>103543786
why would I cry about Xitter technical problems on bluesky?

Anonymous
12/16/24(Mon)20:25:16 No.103543835

Anonymous 12/16/24(Mon)20:25:16 No.103543835

>>103543742
the fact that you need an account to see remotely anything at all there makes it even worse.

Anonymous
12/16/24(Mon)20:27:22 No.103543855

Anonymous 12/16/24(Mon)20:27:22 No.103543855

>>103543814
It's rather personal to me, and it's also a shitton of gibberish that serves as "user": and "assistant":
Is there anything you want to check in particular?

Anonymous
12/16/24(Mon)20:27:39 No.103543860

Anonymous 12/16/24(Mon)20:27:39 No.103543860

>>103543742
If you're so adamant about stomping on your own balls because muh elon, at least have the decency to stomp on them yourself.

Anonymous
12/16/24(Mon)20:44:03 No.103543992

Anonymous 12/16/24(Mon)20:44:03 No.103543992

>largestral 2411 is now below 2407 on lmsys with and without style control
Arthur, are you okay? Are you okay, Arthur? What the hell happened to Mistral? Jump from 2402 to 2407 was HUGE, the new one is barely a sidegrade. They couldn't even get system prompt working properly 100% of the time on low context. What were they doing for 4 months? Gooning? It better be gooning. You better don't tell me they spent all that time making this finetune.

Anonymous
12/16/24(Mon)20:48:52 No.103544041

Anonymous 12/16/24(Mon)20:48:52 No.103544041

Regarding LCMs, saw this comment:
>One think I am thinking about is that this could make jailbreaking those models "impossible" because, they can just "prohibit a concept", and no matter how hard you try to jailbreak it, they could just ban that "concept" so it would always refuse. Its like, it could look at its own output and like look at the "concepts" that text includes and just refuse it. If they implement this good enough, this would not send out any false positives too, outside of legit uses including those concepts, like maybe a summary of a book, or a scientific paper about that prohibited concept.
https://www.reddit.com/r/SillyTavernAI/comments/1hfjk3l/large_concept_models_and_their_possible_impacts/

Anonymous
12/16/24(Mon)20:50:50 No.103544056

Anonymous 12/16/24(Mon)20:50:50 No.103544056

>>103544041
I'm not "exactly" sure why this "person" is "writing" like "this"

Anonymous
12/16/24(Mon)20:52:40 No.103544067

Anonymous 12/16/24(Mon)20:52:40 No.103544067

can anyone recommend a 7B or around that model for generating good coom prompts for images based off my booru-like input?

Anonymous
12/16/24(Mon)20:57:41 No.103544106

Anonymous 12/16/24(Mon)20:57:41 No.103544106

File: tipo.png (71 KB, 628x591)

71 KB PNG

>>103544067
How about a 0.5B?
>https://huggingface.co/KBlueLeaf/TIPO-500M
No idea if it's any good. I downloaded it but i haven't tested it yet. He has even smaller models too.

Anonymous
12/16/24(Mon)20:58:37 No.103544113

Anonymous 12/16/24(Mon)20:58:37 No.103544113

File: not miku sitting forest w(...).png (1.58 MB, 1024x1024)

1.58 MB PNG

>>103543855
There are many variables and forces at play. Technical details such as training hyperparameters and attaining a good fit to the data are some. Even if you get those right, if the data is shit, then so will be the results.
If you share, then a quick skim+sniff test can be done to see if the data is remotely good. Literally nobody cares what you have in there. You can do some regexing of PII if it contains truly sensitive information.
If you do not share, then there's only vague advice that can be given. There is not a one size fits all guide to make a model that is your idea of good.
To change how a model writes in different scenarios - a difficult task - you need enough examples that tech it what text to produce, given an input. The data has to be wide enough in scope and example length to cover all the topics and interactions that you wish to have.
Without knowing how your data looks, and how much of it you have, it will be difficult to give you advice, Anon. If secrecy is important, then you will have to figure a lot of this stuff out yourself. Best to look at existing datasets that produce results that you find desirable, and then take inspiration from those.

Anonymous
12/16/24(Mon)21:00:28 No.103544132

Anonymous 12/16/24(Mon)21:00:28 No.103544132

File: quotes.png (440 KB, 800x400)

440 KB PNG

>>103544056
He's one of "those" people...

Anonymous
12/16/24(Mon)21:00:36 No.103544135

Anonymous 12/16/24(Mon)21:00:36 No.103544135

>>103543855
If that's you pissanon, you'd better post that model after it's finetuned

Sincerely
A fellow pissanon

Anonymous
12/16/24(Mon)21:00:54 No.103544140

Anonymous 12/16/24(Mon)21:00:54 No.103544140

>>103544106
thanks anon
>booru training data
perfect

Anonymous
12/16/24(Mon)21:02:11 No.103544150

Anonymous 12/16/24(Mon)21:02:11 No.103544150

>>103544106
If you can get this model to run and navigate the bafflingly fragile gradio interface hats off to you.

Anonymous
12/16/24(Mon)21:08:17 No.103544211

Anonymous 12/16/24(Mon)21:08:17 No.103544211

File: HunyuanVideo_00436.mp4 (620 KB, 640x480)

620 KB MP4

>6 steps at flow 17

Yeesh. I hope this is a bad test and not indicative of the overall shitness of the fast model.

Anonymous
12/16/24(Mon)21:10:27 No.103544232

Anonymous 12/16/24(Mon)21:10:27 No.103544232

>>103544211
Cursed

Anonymous
12/16/24(Mon)21:10:32 No.103544234

Anonymous 12/16/24(Mon)21:10:32 No.103544234

File: tipo02.png (8 KB, 1364x335)

8 KB PNG

>>103544150
Why would i?
I have no idea if that output is any good. Looks dumb. That's the 200M model.

Anonymous
12/16/24(Mon)21:10:43 No.103544235

Anonymous 12/16/24(Mon)21:10:43 No.103544235

>>103544211
>6 steps
Probably too low.

Anonymous
12/16/24(Mon)21:10:45 No.103544236

Anonymous 12/16/24(Mon)21:10:45 No.103544236

File: HunyuanVideo_00437.mp4 (604 KB, 640x480)

604 KB MP4

>>103544232

Anonymous
12/16/24(Mon)21:13:39 No.103544276

Anonymous 12/16/24(Mon)21:13:39 No.103544276

>>103544211
>>103544236
live footage from ohio *3x skull emoji*

Anonymous
12/16/24(Mon)21:14:02 No.103544279

Anonymous 12/16/24(Mon)21:14:02 No.103544279

File: 1733785826835018.gif (63 KB, 638x546)

63 KB GIF

yo where do I get the llama.cpp weights for QWQ?

Anonymous
12/16/24(Mon)21:15:10 No.103544284

Anonymous 12/16/24(Mon)21:15:10 No.103544284

File: HunyuanVideo_00438.mp4 (698 KB, 960x544)

698 KB MP4

Here's 10 steps at 960x544 and flow 15.

idk, feels kind... shitter.

Anonymous
12/16/24(Mon)21:16:14 No.103544296

Anonymous 12/16/24(Mon)21:16:14 No.103544296

>>103544279
hf.co
Should i get a bigger spoon?

Anonymous
12/16/24(Mon)21:17:45 No.103544315

Anonymous 12/16/24(Mon)21:17:45 No.103544315

>>103544296
The official repo only has the .safetensors format. Can I use that with llama.cpp?

Anonymous
12/16/24(Mon)21:18:45 No.103544324

Anonymous 12/16/24(Mon)21:18:45 No.103544324

>>103544279
How do you get to the point of knowing what QwQ is, knowing you can and are about to run it on cpp and still not know how to find the weights?
The order of events to get to that point is completely out of whack.

Anonymous
12/16/24(Mon)21:18:56 No.103544327

Anonymous 12/16/24(Mon)21:18:56 No.103544327

>>103544315
If you're not a retard, you can convert it yourself.
Here's the link otherwise. Open wide
>https://huggingface.co/bartowski/QwQ-32B-Preview-GGUF

Anonymous
12/16/24(Mon)21:21:17 No.103544345

Anonymous 12/16/24(Mon)21:21:17 No.103544345

>>103544324
I do this shit at work, but this is my first time doing it as a hobbyist.

>>103544327
Is that version legit?

Anonymous
12/16/24(Mon)21:22:20 No.103544351

Anonymous 12/16/24(Mon)21:22:20 No.103544351

>>103544345
Fuck how can I get a job in your industry without knowing what a gguf model is?

Anonymous
12/16/24(Mon)21:23:24 No.103544362

Anonymous 12/16/24(Mon)21:23:24 No.103544362

>>103544351
I design and train the models, not use them.

Anonymous
12/16/24(Mon)21:24:01 No.103544367

Anonymous 12/16/24(Mon)21:24:01 No.103544367

>>103544362
this explains a lot

Anonymous
12/16/24(Mon)21:24:42 No.103544371

Anonymous 12/16/24(Mon)21:24:42 No.103544371

>>103544345
>Is that version legit?
You either trust it or not. If you don't learn to convert yourself.
ls your llama.cpp dir, there's a script called convert_hf_to_gguf.py. You'll never guess what it does.
Is it difficult being permanently confused?

>>103544362
pfffffffffff

Anonymous
12/16/24(Mon)21:25:45 No.103544382

Anonymous 12/16/24(Mon)21:25:45 No.103544382

>>103544362
this>>103544367

Anonymous
12/16/24(Mon)21:31:08 No.103544429

Anonymous 12/16/24(Mon)21:31:08 No.103544429

Best model for cooming/rp on 8gb gpu + 32gb ram? (slow generation is ok)

Anonymous
12/16/24(Mon)21:32:57 No.103544448

Anonymous 12/16/24(Mon)21:32:57 No.103544448

>>103544429
If slow generation is okay, I suggest you upgrading to 128gb ram and using largestral.

Anonymous
12/16/24(Mon)21:33:31 No.103544454

Anonymous 12/16/24(Mon)21:33:31 No.103544454

>>103544315
yes
https://rentry.org/tldrhowtoquant/

Anonymous
12/16/24(Mon)21:35:17 No.103544467

Anonymous 12/16/24(Mon)21:35:17 No.103544467

>>103544448
You mean Miqu? Or did someone leak mistral-large outright? Been away from the scene for a bit.

Anonymous
12/16/24(Mon)21:36:06 No.103544474

Anonymous 12/16/24(Mon)21:36:06 No.103544474

File: se.png (345 KB, 1799x1040)

345 KB PNG

I know you guys tricks and I'm not stupid to download yet another big model.
But since it was meme'd enough eva 3.3 ended up on openrouter.
I used their master.json settings from the huggingface.
It certainly isnt shying away from anything even with the llama base.
But thats only 8k context right?

Anonymous
12/16/24(Mon)21:36:23 No.103544476

Anonymous 12/16/24(Mon)21:36:23 No.103544476

>>103544467
>leak
open wide
https://huggingface.co/mistralai/Mistral-Large-Instruct-2411

Anonymous
12/16/24(Mon)21:36:55 No.103544483

Anonymous 12/16/24(Mon)21:36:55 No.103544483

>>103544476
holy hell

Anonymous
12/16/24(Mon)21:37:39 No.103544489

Anonymous 12/16/24(Mon)21:37:39 No.103544489

>>103544476
2407 is superior.

Anonymous
12/16/24(Mon)21:38:13 No.103544493

Anonymous 12/16/24(Mon)21:38:13 No.103544493

>>103544483
you may also be interested in Llama 405b and deepseek v2.5 1210 if you're truly that far out of date

Anonymous
12/16/24(Mon)21:39:45 No.103544507

Anonymous 12/16/24(Mon)21:39:45 No.103544507

>>103544493
Big L is so not worth it.

Anonymous
12/16/24(Mon)21:39:53 No.103544509

Anonymous 12/16/24(Mon)21:39:53 No.103544509

File: file.png (108 KB, 1032x878)

108 KB PNG

we're so doomed
>please read this long ass string of random ass info with no proof of anything!
davidau won tho

Anonymous
12/16/24(Mon)21:40:23 No.103544513

Anonymous 12/16/24(Mon)21:40:23 No.103544513

File: shills_just_wont_stop.png (85 KB, 1266x483)

85 KB PNG

>>103544474
You can see the claimed context in the config.json on the model page.
picrel, last line

Anonymous
12/16/24(Mon)21:42:00 No.103544525

Anonymous 12/16/24(Mon)21:42:00 No.103544525

>>103537827
Unironically just so I can pretend there is a cute girl literally living inside my GPU.

Anonymous
12/16/24(Mon)21:42:00 No.103544526

Anonymous 12/16/24(Mon)21:42:00 No.103544526

File: file.png (8 KB, 390x52)

8 KB PNG

>>103544513
a goritrillion contexts!!!

Anonymous
12/16/24(Mon)21:42:16 No.103544528

Anonymous 12/16/24(Mon)21:42:16 No.103544528

File: se2.png (387 KB, 1863x1108)

387 KB PNG

>>103544474
well that went better than expected. not sure if the model is actually smart. but like many finetunes will make the char not open the package.
you faggots might have me tricked again.

>>103544513
hmmm, isnt it the settings of their traning?
https://huggingface.co/EVA-UNIT-01/EVA-LLaMA-3.33-70B-v0.0
sequence_len: 8192
or am i being retarded?

Anonymous
12/16/24(Mon)21:43:57 No.103544545

Anonymous 12/16/24(Mon)21:43:57 No.103544545

>>103544528
>or am i being retarded?
you are, them training on 8k just means it's not better at high ctx than what it's based on, meta also did not train at 128k for the entire training regime, likely just a bit at the end.

Anonymous
12/16/24(Mon)21:45:03 No.103544553

Anonymous 12/16/24(Mon)21:45:03 No.103544553

>>103544545
hmmm, but many of the finetunes usually do shit the bed early at 8k. so i'm suspicious of that.
thanks in any case, will check out further.

Anonymous
12/16/24(Mon)21:45:30 No.103544558

Anonymous 12/16/24(Mon)21:45:30 No.103544558

>>103544525
There needs to be a sequel to the classic buttobi CPU for GPU girls

Anonymous
12/16/24(Mon)21:46:30 No.103544568

Anonymous 12/16/24(Mon)21:46:30 No.103544568

>>103538430
Come back in another year when decentralized training is perfected. we are still waiting for local uncensored c.ai. As long as we aren’t training our own models we will eat slop.

Anonymous
12/16/24(Mon)21:46:46 No.103544572

Anonymous 12/16/24(Mon)21:46:46 No.103544572

>>103544507
>Big L is so not worth it.
It definitely isn't with the state of enthusiast hardware. If infinite processing were free, I'm sure we'd find a place for it in our rotation. Its pretty smart if you wait for a reply to drip out.

Anonymous
12/16/24(Mon)21:48:41 No.103544594

Anonymous 12/16/24(Mon)21:48:41 No.103544594

>>103544572
Problem is it's only very marginally smarter than L3.3 70B and so dry as fuck that it feels dumber anyway

Anonymous
12/16/24(Mon)21:49:14 No.103544596

Anonymous 12/16/24(Mon)21:49:14 No.103544596

File: deepthought.png (423 KB, 639x448)

423 KB PNG

>>103544572

Anonymous
12/16/24(Mon)21:54:04 No.103544640

Anonymous 12/16/24(Mon)21:54:04 No.103544640

>>103544545
training at the end is different from finetuning at lower context. The latter carries the risk of catastrophic forgetting. tbf who the fuck knows how this shit works, if it works it works, but I wouldn't be surprised if these models become subtly worse at long-context tasks.

Anonymous
12/16/24(Mon)22:00:23 No.103544686

Anonymous 12/16/24(Mon)22:00:23 No.103544686

>>103544509
>davidau
Holy shit... What I have learned from that article is that he is adept at blending facts and misconceptions in an undigestable format.

Anonymous
12/16/24(Mon)22:00:35 No.103544688

Anonymous 12/16/24(Mon)22:00:35 No.103544688

>>103544362
HOLY MOTHER OF NEPOTISM

Anonymous
12/16/24(Mon)22:02:43 No.103544708

Anonymous 12/16/24(Mon)22:02:43 No.103544708

>>103539993
Nah this nigga retarded as hell

Anonymous
12/16/24(Mon)22:04:10 No.103544718

Anonymous 12/16/24(Mon)22:04:10 No.103544718

>>103544362
The fabled clueless AI-adjacent expert "researcher".

Anonymous
12/16/24(Mon)22:06:56 No.103544733

Anonymous 12/16/24(Mon)22:06:56 No.103544733

>>103544509
these people are everywhere now.

Anonymous
12/16/24(Mon)22:09:04 No.103544746

Anonymous 12/16/24(Mon)22:09:04 No.103544746

>>103540335
It takes like 30 seconds to copy paste the commands (maybe a minute extra for adding flags you want/need), then a minute or two or compiling (more if you compile with all KV quants)
If it takes you 3 hours to do that then you're functionally retarded and I'm really sorry for you, luckily binaries exists for special people

Anonymous
12/16/24(Mon)22:13:44 No.103544774

Anonymous 12/16/24(Mon)22:13:44 No.103544774

>>103541463
Doubt it, iirc cuda anon only wrote quantized kernels for CUDA, so no cpu support

Anonymous
12/16/24(Mon)22:17:59 No.103544805

Anonymous 12/16/24(Mon)22:17:59 No.103544805

>>103541463
As far as I know, yes.
You can load the model in RAM, and let the quantized context live in vram.
As the other anon pointed out, the CPU flash attention kernels might not be all that well optimized.

Anonymous
12/16/24(Mon)22:19:41 No.103544819

Anonymous 12/16/24(Mon)22:19:41 No.103544819

Wait EVA is actually good what the fuck I thought it was just a meme

Anonymous
12/16/24(Mon)22:19:44 No.103544820

Anonymous 12/16/24(Mon)22:19:44 No.103544820

File: file.png (289 KB, 769x907)

289 KB PNG

We did it, anons. We saved the internet.

Anonymous
12/16/24(Mon)22:21:55 No.103544836

Anonymous 12/16/24(Mon)22:21:55 No.103544836

>>103544820
gguf?

Anonymous
12/16/24(Mon)22:23:48 No.103544854

Anonymous 12/16/24(Mon)22:23:48 No.103544854

>>103544474
It works fine up to 32K at least which is what I use.

Anonymous
12/16/24(Mon)22:25:56 No.103544872

Anonymous 12/16/24(Mon)22:25:56 No.103544872

>>103539929
Haven't seen anything great after Rocinante 1.1.

Violet_Twilight-v0.2 is decent. Need to test more. I think it might be a bit insipid.
Flammades-Mistral-Nemo-12B is borderline good but gets very romantic and lovey dovey with ease.
Starcannon-Unleashed-12B was decent but I forget most details.

Drummer hasn't made anything good since 1.1

Anonymous
12/16/24(Mon)22:26:32 No.103544876

Anonymous 12/16/24(Mon)22:26:32 No.103544876

>>103544872
I'll endorse this anon's message.

Anonymous
12/16/24(Mon)22:27:21 No.103544884

Anonymous 12/16/24(Mon)22:27:21 No.103544884

File: HunyuanVideo_00442.mp4 (454 KB, 960x544)

454 KB MP4

Distilled Hyvid looks like deep fried ass and I'm tired of pretending it doesn't.

Anonymous
12/16/24(Mon)22:28:29 No.103544889

Anonymous 12/16/24(Mon)22:28:29 No.103544889

>>103544820
sakana Ai...they're on hf but just a few tiny llama-derived models

Anonymous
12/16/24(Mon)22:30:39 No.103544903

Anonymous 12/16/24(Mon)22:30:39 No.103544903

>>103544884
Yea, I switched back to the regular one.

Anonymous
12/16/24(Mon)22:30:41 No.103544904

Anonymous 12/16/24(Mon)22:30:41 No.103544904

>>103544820
I didn't understand a single word, but I'm still convinced it's a grift and a nothingburger.
https://sakana.ai/namm/

Anonymous
12/16/24(Mon)22:32:17 No.103544918

Anonymous 12/16/24(Mon)22:32:17 No.103544918

>>103544820
There's like one guy in Japan who's decent with anything to with AI and that's Kohya. The country as a whole is technological black hole stuck in the 90s

Anonymous
12/16/24(Mon)22:34:52 No.103544934

Anonymous 12/16/24(Mon)22:34:52 No.103544934

>>103544918
they're actually okay with anything that doesn't require a pc, since nintendo made those illegal

Anonymous
12/16/24(Mon)22:38:26 No.103544960

Anonymous 12/16/24(Mon)22:38:26 No.103544960

>>103544904
>NAMMs are simple neural network classifiers trained to decide whether to “remember” or “forget” for each given token stored in memory. This new capability allows transformers to discard unhelpful or redundant details, and focus on the most critical information, something we find to be crucial for tasks requiring long-context reasoning.

Anonymous
12/16/24(Mon)22:38:28 No.103544962

Anonymous 12/16/24(Mon)22:38:28 No.103544962

>>103544820
>Vector Database 2.0
WOW WE ARE SO BCK

Anonymous
12/16/24(Mon)22:39:48 No.103544974

Anonymous 12/16/24(Mon)22:39:48 No.103544974

>>103544960
Sounds dumb and like it won't work as intended.

Anonymous
12/16/24(Mon)22:43:06 No.103544994

Anonymous 12/16/24(Mon)22:43:06 No.103544994

>>103544904
I remember them from a while back, this is actually mostly foreigners and not japanese at all. They got jp money though..
Founders:
>David Ha (Google Brain, Goldman Sachs) Llion Jones (Google Research, Transformer Co-Creator) Ren Ito (Mercari, Ministry of Foreign Affairs of Japan)
They got money from nvidia etc.
Their blog posts read like the shitcoin pages. Like hyping something up but written in a way that makes you go "i dont understand but that must be cool!"
I wouldnt bother checking anything they write, but who knows

Anonymous
12/16/24(Mon)22:48:33 No.103545023

Anonymous 12/16/24(Mon)22:48:33 No.103545023

>>103541895
>>103542165
I have 24 GB of VRAM and 64 GB of DDR4 RAM. Running Nemotron 70B Q4_K_M with 35 layers offloaded and 17408 tokens of context I get about 1.2 tokens per second. (If I drop to IQ4_XS I can offload 45 layers and get around 1.6 tokens per second.) If you get are getting 0.5 tokens per second trying to run a Q4 70B something is wrong.

Anonymous
12/16/24(Mon)22:51:10 No.103545043

Anonymous 12/16/24(Mon)22:51:10 No.103545043

>>103541938
Nevermind, this sucks. It's not built for cuda compute capability 6.1, so I get the kernel error, and it threw me into DLL hell except with .so files trying to build it for that level.
I was able to build ollama from source after patching it without nearly this amount of bullshit.

Anonymous
12/16/24(Mon)22:57:30 No.103545084

Anonymous 12/16/24(Mon)22:57:30 No.103545084

>>103545043
And I got the model working with ollama in 3 minutes.

Anonymous
12/16/24(Mon)23:03:13 No.103545125

Anonymous 12/16/24(Mon)23:03:13 No.103545125

>>103545084
BrO dOnT uSe OlLAmA itS bAd Bro DonT uSe iT plEase bRo donT dO ThiS to The CoMMuNitY thEy AreNt AuTistS bRo PLeAse Bro OllAmA sUckS bRo

Anonymous
12/16/24(Mon)23:04:30 No.103545135

Anonymous 12/16/24(Mon)23:04:30 No.103545135

more like ollame lmao

Anonymous
12/16/24(Mon)23:08:57 No.103545166

Anonymous 12/16/24(Mon)23:08:57 No.103545166

Olleddit.

Anonymous
12/16/24(Mon)23:11:47 No.103545190

Anonymous 12/16/24(Mon)23:11:47 No.103545190

oshit

Anonymous
12/16/24(Mon)23:20:00 No.103545254

Anonymous 12/16/24(Mon)23:20:00 No.103545254

>>103545125
btw in case he shows up again, it also doesn't work with stella because of https://huggingface.co/dunzhang/stella_en_400M_v5/blob/main/config.json architectures:NewModel
and I know I'm supposed to rewrite the classes in
https://huggingface.co/dunzhang/stella_en_400M_v5/blob/main/configuration.py#L23
https://huggingface.co/dunzhang/stella_en_400M_v5/blob/main/configuration.py#L23
to make a plugin for VLLM and I don't want to.

Anonymous
12/16/24(Mon)23:25:29 No.103545307

Anonymous 12/16/24(Mon)23:25:29 No.103545307

What quants of 70B are anons using? I tried IQ4_XS and it's alright but not really sure.

Anonymous
12/16/24(Mon)23:27:07 No.103545317

Anonymous 12/16/24(Mon)23:27:07 No.103545317

>>103541455
in my country the lowest on ebay used 3090 is 600$

Anonymous
12/16/24(Mon)23:29:01 No.103545333

Anonymous 12/16/24(Mon)23:29:01 No.103545333

>>103541324
Intel has a chance to seize the consumer market. Will they take it or will they overshoot and be slaughtered by Nvidia in the enterprise GPU grift?

Anonymous
12/16/24(Mon)23:29:20 No.103545335

Anonymous 12/16/24(Mon)23:29:20 No.103545335

>>103541455
and the next 800$
so i would prefer buying a new 24gb b580

Anonymous
12/16/24(Mon)23:46:43 No.103545450

Anonymous 12/16/24(Mon)23:46:43 No.103545450

>>103540988
Largestral is better than gpt-4 (but not 4o)

Anonymous
12/16/24(Mon)23:47:30 No.103545460

Anonymous 12/16/24(Mon)23:47:30 No.103545460

>>103542165
What Q2 quants were you using? Anything below IQ2_S is absolute garbage. I believe the difference between IQ5 and IQ2_S is only slightly bigger than the difference between IQ2_S and IQ2_XXS. Exponential degradation hits IQ2 hard.

Anonymous
12/16/24(Mon)23:48:40 No.103545467

Anonymous 12/16/24(Mon)23:48:40 No.103545467

Why dont more people use control vectors? It fixes most issues with most models.

https://huggingface.co/jukofyork/creative-writing-control-vectors-v3.0

Anonymous
12/16/24(Mon)23:56:24 No.103545539

Anonymous 12/16/24(Mon)23:56:24 No.103545539

Also I tried, Phi 4, quite sloppy but its really smart for its size, for sure smarter than nemo.

Anonymous
12/16/24(Mon)23:56:31 No.103545540

Anonymous 12/16/24(Mon)23:56:31 No.103545540

>>103545467
It was getting interesting but then I realized it was only for RP/storywriting.

Anonymous
12/17/24(Tue)00:03:04 No.103545618

Anonymous 12/17/24(Tue)00:03:04 No.103545618

Intel better release that b580 24gb soon because I can’t hold on for much longer!

Anonymous
12/17/24(Tue)00:04:22 No.103545631

Anonymous 12/17/24(Tue)00:04:22 No.103545631

Lets say I wanted to get a card 100% dedicated to context and context processing...would i be able to get some previous gen 32gb/48gb monstrosity with slow vram and still come out ahead for total time to process the prompt?
How bad are previous gen cards for that sort of thing if you don't intend to do any inference of the main model on them? eg. 3090s for most of the model and an m10 32gb just for context/prompt processing?

Anonymous
12/17/24(Tue)00:05:49 No.103545641

Anonymous 12/17/24(Tue)00:05:49 No.103545641

>>103545467
I don't know what they are

Anonymous
12/17/24(Tue)00:06:11 No.103545644

Anonymous 12/17/24(Tue)00:06:11 No.103545644

>>103545618
I am totally down for taking my money to intel if they can release these cards, my only concern is how much pain will I have to go through to get things working? With AMD, you're basically buying a device made to look similar to a NVIDIA device but frustratingly lacking in software support. So what does an intel card look like? How quickly can I go from shoving the card in my PC to running a model? Will be bound by Linux or windows? I'm prepared to be a little inconvenienced, if only to stick it to NVIDIA but if it's straight up incapable of what NVIDIA can do under any circumstances then that's a deal breaker.

Anonymous
12/17/24(Tue)00:09:26 No.103545667

Anonymous 12/17/24(Tue)00:09:26 No.103545667

>>103545644
If some dumbass on reddit could do it then I also can.

Anonymous
12/17/24(Tue)00:13:55 No.103545698

Anonymous 12/17/24(Tue)00:13:55 No.103545698

>>103543818
I know absolutely nothing about your situation nor how vram is used in linux.

My suggestion: See if there is no-overcommit parameter for vram.
I think there was a parameter like that for ram
and that you could use it if you wanted programs to just be killed if there wasn't enough ram to go around instead of consuming ever amounts of swap.

If you do find an answer then please post.
The less effort solution in you case is just to keep mental track of whether there's an llm in vram.

Anonymous
12/17/24(Tue)00:15:38 No.103545708

Anonymous 12/17/24(Tue)00:15:38 No.103545708

>>103545467
as all of these things, people tried it and it didn't catch on because it's shit and made the model dumber

Anonymous
12/17/24(Tue)00:17:02 No.103545724

Anonymous 12/17/24(Tue)00:17:02 No.103545724

>>103545467
No UI has implemented support so only being able to use them on the command line is a killer.

Anonymous
12/17/24(Tue)00:17:15 No.103545725

Anonymous 12/17/24(Tue)00:17:15 No.103545725

>>103545710
>>103545710
>>103545710

Anonymous
12/17/24(Tue)00:30:10 No.103545820

Anonymous 12/17/24(Tue)00:30:10 No.103545820

>>103541088
>"M'lady." *tips penis*

Anonymous
12/17/24(Tue)00:36:00 No.103545876

Anonymous 12/17/24(Tue)00:36:00 No.103545876

>>103545631
Hmmm. Let's break this down step-by-step.
>you'll need some anon with multiple gpus
>they'll need to know what layers to put on which gpus
>they'll then need to downclock the gpus that are doing the prompt processing to simulate m10s

Anonymous
12/17/24(Tue)00:36:13 No.103545881

Anonymous 12/17/24(Tue)00:36:13 No.103545881

>>103545467
You have to restart the server and process context again every time you want to adjust the settings

Anonymous
12/17/24(Tue)01:41:58 No.103546468

Anonymous 12/17/24(Tue)01:41:58 No.103546468

Can someone point me towards a good training guide, retard proof?
I have tons of plain text files that I've made processors to, but I simply can't get the training to work.
I would also like to learn how to make it so my model can be fed new information, not to add to it's model or training, but to use as a reference for the responses that it'll give to a user, utilizing its training as a reference to analyze that new information.
An example would be
>webscraper gathers data from a news website
>LLM that's been trained in politics gathers that data
>every conversation that the LLM gets that day uses the news in conjuction with its current trained model

Anonymous
12/17/24(Tue)02:10:15 No.103546675

Anonymous 12/17/24(Tue)02:10:15 No.103546675

>>103544960
Even if it did work, you'd just be kicking the can down the road just like with all the other HUGE OPTIMIZATIONS (insert your basedface of choice here)
Transformers scale like absolute ass, we need something better

llama.cpp CUDA dev !!OM2Fp6Fn93S
12/17/24(Tue)02:28:51 No.103546796

llama.cpp CUDA dev !!OM2Fp6Fn93S 12/17/24(Tue)02:28:51 No.103546796

>>103541463
>>103544774
I added both CPU and CUDA implementations for quantized KV cache.
I think the koboldcpp documentation is just poorly worded.

Anonymous
12/17/24(Tue)03:50:24 No.103547285

Anonymous 12/17/24(Tue)03:50:24 No.103547285

>>103544476
>>103544489
>2407 is superior.
this

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.