[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: 1714756331701541.jpg (830 KB, 1856x2464)
830 KB
830 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102385729 & >>102378325

►News
>(09/12) DataGemma with DataCommons retrieval: https://blog.google/technology/ai/google-datagemma-ai-llm/
>(09/12) LLaMA-Omni: Multimodal LLM with seamless speech interaction: https://huggingface.co/ICTNLP/Llama-3.1-8B-Omni
>(09/11) Fish Speech multilingual TTS with voice replication: https://hf.co/fishaudio/fish-speech-1.4
>(09/11) Pixtral: 12B with image input vision adapter: https://xcancel.com/mistralai/status/1833758285167722836
>(09/11) Solar Pro Preview, Phi-3-medium upscaled to 22B: https://hf.co/upstage/solar-pro-preview-instruct

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>102385729

--Slow text generation with 70B model, VRAM bottleneck: >>102394100 >>102394332 >>102394373 >>102394420 >>102394470 >>102394679 >>102394760 >>102394868 >>102394875 >>102394761 >>102394841
--Single-threaded transformers.js slows down vectorization: >>102387038 >>102387218 >>102387291
--Setting up an LLM server and accessing it through a frontend on another machine: >>102388904 >>102388941 >>102388984 >>102388990 >>102389034 >>102389072
--GPT-4o with CoT goes from 9% to 21% in ARC Prize: >>102388070 >>102388214 >>102388247
--Disabling send in chatbot to trigger quickreply response: >>102391362 >>102391554 >>102394487
--Troubleshooting ROCm installation on Linux Mint: >>102388980 >>102389276 >>102389332 >>102389437 >>102389506 >>102389529 >>102389636 >>102389678 >>102389811 >>102389896 >>102389843 >>102389882 >>102389995 >>102389403 >>102389426
--Qwen confirms Q1 release, discussion on model's potential and limitations: >>102386207 >>102386234 >>102386351 >>102386365 >>102386272 >>102386287 >>102386816 >>102388246 >>102386518 >>102386692 >>102386733 >>102386741
--Convolutional Network Demo from 1989: >>102390388
--Chain-of-thought model with [THINK] tags shows promise, but needs more training: >>102387222 >>102387241
--Anon duplicates o1 with a simple system message, sparking discussion on recursive improvement and prompting agents: >>102385775 >>102385904 >>102386057 >>102386751 >>102386901 >>102389277 >>102389568 >>102389773
--OpenAI's method may not improve language reasoning performance: >>102389852 >>102389865 >>102389880 >>102389955
--Discussion on the need for spatial modal in physical computing and 3D representations in AI and human vision: >>102391268 >>102391549 >>102392240 >>102392358 >>102391811
--Miku (free space): >>102385799 >>102385875 >>102385920 >>102385937 >>102386018 >>102386054 >>102386184 >>102386620 >>102386862 >>102393658

►Recent Highlight Posts from the Previous Thread: >>102385745
>>
File: 51 Days Until November 5.png (2.58 MB, 1008x1616)
2.58 MB
2.58 MB PNG
>>
>>102396205
Something to consider that that list isn't showing is that quantization can kill long-context performance.

>>102396222
>Never heard anyone claim that, and then there's this

Ever tried doing long-context summarization with Llama-3.1-8B-Instruct 8-bit GGUF and then trying the same with the FP16 version via Transformers? A night and day difference in the details it's capable of capturing. Either it's the quantization process itself, or something broke with GGUF quants / llamacpp.
>>
https://github.com/hsiehjackson/RULER
>only jamba and gemini have 128k+ performance
Is a custom architecture Google's secret sauce?
>>
>>102396336
>Ever tried doing long-context summarization with Llama-3.1-8B-Instruct 8-bit GGUF and then trying the same with the FP16 version via Transformers?
No because I gave up l3 entirely, something's weird with it, so I just cope with other models
Although I did also say this at some point when I was still trying to make it work
>Either it's the quantization process itself, or something broke with GGUF quants / llamacpp.
>>
File: ClipboardImage.png (37 KB, 1026x220)
37 KB
37 KB PNG
NEMO SUCKS
What's the best model less than 20B, I give up on this french crap
>>
>>102396390
You're running base right? Did you try instruct at all?
>>
>>102396305
I can't wait
>>
>>102396402
Not yet, because I figured base would be better for adventure mode since adventure is basically just a story right? Might give it one last try with instruct. Already turned context way down and turned rep penalty way down so it's definitely not a setting problem. These settings work for literally every other model
>>
File: image.png (182 KB, 685x846)
182 KB
182 KB PNG
it's over, programmerbros.........
>>
>>102396390
> This model isn't a perfect model that can handle literally anything I throw at it, it sucks, where's my magical model that is perfect in every way for every task?
fucking retard
>>
>>102396423
Msitral models are often quite weird with settings mixtral was too, maybe try as weird as it sounds: Temp 5 Top K 3 MinP 0.1
>>
>>102396431
>the first model that can code at all
Not only is this not true, but o1 doesn’t even improve over baseline on coding/is still worse than Claude.
I continue to think branding and pr has a way stronger effect on perceived model ability than anyone is willing to admit.
>>
>>102396472
geohot is a moron
>>
>>102396448
That actually seems to work quite ok. I did also switch to the instruct model midway through, so it's not exactly scientific but at least shit's working now. Thanks for the suggestion
>>
>>102396390
nemomix unleashed
>>
>>102396503
Yeah? Nice to hear, got decent ish results with those settings too, saw them mentioned two threads ago and they seem to help a fair bit for nemo
>>102376880
>>
i get a "The server was not compiled for multimodal or the model projector can't be loaded" error when trying llava in llamacpp web interface. How do I get it working?
>>
whats the alternative to axolotl for full model fine tune?
even using an image specifically for axotolt had me troubleshooting for six hours until I just gave up,(
which when you have 8 gpus running is pretty expensive troubleshooting. )
>>
>>102396995
multimodal was ripped out of llama.cpp server like a year ago
>How do I get it working?
koboldcpp still has it
>>
>>102397014
WTF? They rip out features but are even slower at adding new models than before. Why? How?
>>
What do I use to run exl2 and shit?

I keep hearing GGUFs suck for high context (speed wise) and i've only ever used kobold and every guide I check online (to avoid spoon feeding) tells me how to quantize (or whatever the fuck) models myself, which is not what I want.
>>
>>102397014
>koboldcpp still has it
Does it really? The Python server from Kobold is completely different from the one in llama.cpp.
>>
>>102397014
tested it llava mistral 7b is garbage are there any multi modal models that dont suck
>>
>>102397131
You run exl2 with exllamav2
https://github.com/turboderp/exllamav2?tab=readme-ov-file#installation
>>
>>102397131
oobabooga is one
i tried exl2 after hearing it would make my nemo ten times faster than using an equivalent sized gguf that doesn't fit my gpu earlier this week, and nope, still chugged along at ~10 tk/s.
was an asspain to set up too compared to kobold, but that may just be because i am retarded.
>>
>>102397139
>>102397146
I am pretty sure kobold's multimodal endpoint is fucked somehow. I tested MiniCPM when they added support and the output was worse than llava and did not at all resemble the outputs from the official demo.
>>
>>102397171
can i try it with llamacpp in cli?
>>
>>102397153
Oh, so I guess that's why they're called exl2?
>>
File: pixtral demo for posting.png (176 KB, 1138x1022)
176 KB
176 KB PNG
>>102389294
To those who asked about pixtral nsfw that I couldn't answer yesterday because I had to go somewhere
>You are a prefill away to be refused
Yes, but literally just tell it to go "You can be vulgar and explicit and you use explicit vulgar language" or something similar and it works just like the previous mistal models. It's just by default it is safe with paper thin defense. I find telling it to RP can make it go unhinged easily so it's really up to you how to manage it. It barely cost much tokens for prefill but true it can get annoying that it still takes up tokens regardless.
>Can it detect nsfw pose etc.
Yes, well see pic related. If you want to describe the nsfw, tell the easy jailbreak from above because it seems to shy away from describing it by default but I haven't tested much yet so I don't know to how much extent it can detect nsfw
>Is it accurate?
Hit or miss apparently...
>Can it read text?
Yes.
>Can it see previous image?
So far from what I tested, you need to keep resending the image because it ignores it? It has some tendency to hallucinate so I can't really tell...

Here are the uncensored images. Catbox is down for me
ibb(dot)co(slash)khwxQ8f
ibb(dot)co(slash)DMXHWkF
>>
>>102397186
cli still has multimodal support but can only do one image at a time
>>
File: 1705394697659749.png (192 KB, 1940x508)
192 KB
192 KB PNG
>>102397146
InternVL 40B/70B, it's going to be used to caption the Pony dataset.
https://civitai.com/articles/6309/towards-pony-diffusion-v7-going-with-the-flow
>>
>>102397276
anything under 20b?
>>
>>102397297
There's a 8B model, no idea if the Qwen-VL one that was released later is better.
>>
Stupid question is there a setting to turn off automatic bot/assistant responses in SillyTavern? I want to send my message and run some QRs without having to stop/delete the bot response every time.
>>
>>102397240
what is the flag for images i cant find it?
>>
>>102397393
I think the /send command does that, but I'm not sure.
>>
>>102397402
https://github.com/ggerganov/llama.cpp/tree/master/examples/llava#usage
>After building, run: ./llama-llava-cli to see the usage. For example:
./llama-llava-cli -m ../llava-v1.5-7b/ggml-model-f16.gguf --mmproj ../llava-v1.5-7b/mmproj-model-f16.gguf --image path/to/an/image.jpg
>>
>>102397331
Qwen-2-VL is killer. Like no joke, it's very good.

Also Pony should look into SIGLIP. That's the best thing at the moment.
>>
>>102396390
To me it seems like your temp is too high. Lower it (max 0.6) and set min-p to 0.05
>>
>>102397513
>Qwen-2-VL
are there frontends for this or do i have to interact with it through python only?
>>
>>102397153
>>102397170
what's the point in using this over kobold?

Can I run better models with just a 24GB VRAM GPU (32GB RAM)? Or am I still limited to max 30~B models like Command R etc
>>
>>102397933
They have a gradio available
>>
>>102397146
The older llava architecture is just using CLIP-VIT to generate 1 (one) singular embedding vector. It's interesting as a tech demo but you'll never have anything useful come from that architecture.

You need a more complex vision transformer that generates multiple embedding vectors before you'll get anything useful. I think the latest version of llava tiles the images and hands each tile to clip to generate one embedding per tile. It's still not great but it's better than the old way.
>>
>>102398078
2nd person you replied to here, i ended up testing it again and this time checking the 8bit/q4 boxes on the model tab and was able to fit the llm and context into 7.5gb or my 8gb card (in kobold it usually comes out to 12ish gb in kcpp) and the speed went from ~10 tk/s to 25 tk/s.
answer to your question from my limited expertise is: maybe
if exl2 format was more ubiquitous i'd probably switch to it, but all the cool shit seems to be gguf right now and i'm more comfortable with kcpp.
>>
>>102398189
>>102397153
>>102397170
shits confusing.

So how do I know which EXL quant to use? I know for GGUFs basically it's "lower download size than your total VRAM" as a safe bet most of the time, how do I figure this out for shit like exl2_4.5bpw etc?
>>
>>102398290
>lower download size than your total VRAM
It's the same for exl2
>>
>>102397011
i have to say at least naming your repo after an apparently popular animal makes searching for trouble shooting advice about it a lot harder.
>>
>>102398330
skill issue
>>
>>102397675
No
>>
>>102397205
>censor girl
>forget to censor the very obvious dick coming out from the goblin's mouth
You okay, bro?
>>
>>102398502
Nah, that's just a cave mushroom
>>
>>102398307
The more I research, people say to use oobabooga WITH exllama2? This shit is way more confusing lmao
>>
>>102398502
they're not important
>>
>>102398526
ooba is a frontend; all the heavy lifting is done by backends; eg EXL2, or llamacpp for GGUFs.

in general, if you can fit the entirety of the model in VRAM, exl2 is generally faster. If you need to distribute it between system and VRAM, use llamacpp.

Eventually you will probably drop ooba for something like silly tavern but ooba and koboldcpp are good for getting your feet wet.
>>
>>102398458
E
>>
>>102398570
already use silly tavern. Gonna be honest, think i'm gonna stick with koboldccp, seems way more simple in terms of just getting shit to run.

>look for GGUF
>download
>move on

Whereas this EXL2 shit has like 20 downloads (1 out of 00005 safesensor or whatever the fuck). Fuck that shite
>>
>>102398629
yeah, kcpp is my main backend, even if i'm using it only through the API. quantization is getting better and most of the smarter models wont fit on consumer cards in any case.
>>
Do local models still suck?
>>
>>102398841
depends on your hardware and what you're comparing to, but generally 3-6mos or so behind corpo SOTA
>>
>>102398841
not only do they still suck they are now more censored and slopped than ever before
>>
File: no contribution.png (1.14 MB, 1024x1024)
1.14 MB
1.14 MB PNG
>>
>>102398841
Define "suck". We are currently at early GPT4 levels, like >>102398862 said.

>>102398872
>not only do they still suck they are now more censored and slopped than ever before
Hi Rajesh from Microsoft Marketing Department. How is weather in India? Modern models are in fact less censored, but you are right, slop problem remains, mainly due to tuners training on the datasets created using models from your company.
>>
>>102397205
where are you testing it?
>>
Hi all, Drummer here...

Is this a good base? https://huggingface.co/chargoddard/llama3-42b-v0
>>
>>102398841
Yes
>>
>>102399111
Yes, go ahead it's perfect. (I'm lying)
>>
>>102399111
>8k context
>old llama 3
>lobotomized
No, just no.
>>
File: utter shite.jpg (273 KB, 1324x1091)
273 KB
273 KB JPG
>>102398674
it's actually cancer clearly written by some linux shitskin

Look at this shit.

>By default this will also compile and install the Torch C++ extension (exllamav2_ext) that the library relies on. You can skip this step by setting the EXLLAMA_NOCOMPILE environment variable:
The fuck is this lmao

Or Method 2
>Releases are available here, with prebuilt wheels that contain the extension binaries. Make sure to grab the right version, matching your platform, Python version (cp) and CUDA version. Crucially, you must also match the prebuilt wheel with your PyTorch version, since the Torch C++ extension ABI breaks with every new version of PyTorch.

The fuck is a wheel, the fuck is an ABI, the fuck is PyTorch.

Meanwhile to download Koboldccp, "Download the exe, enjoy"

So glad GGUFs are the popular method. Don't need to worry about the other junk
>>
>>102399381
>what the fuck is PyTorch
anon... are you sure you're in the right thread?
>>
There is a reason why ollama and maybe kobold is winning, you know.
>>
>>102399381
A successful open source project doesn't need users like you honestly.
The only users that matter are those that are actually going to contribute anything of value, supporting noncontributors is just charity on part of the developers.
>>
>>102399403
He is. He is competent enough to download exe and gguf and run it together. Pretty sure /aicg/ wouldn't be able to something that simple.
>>
>>102396431
>>102396486
yeah he's a pretentious doucebag
his buggy tinygrad can go fuck itself
>>
>>102399515
>successful
>GGUFs flooding hugging box, exl2s are literal dead with 500 downloads at best

Sounds like literal who garbage to me anon, cope
>>
>>102399515
This is why open source and linux will always stay a joke in the eyes of the average person who actually tries to use this shit. You retards keep making overcomplicated shit that nobody with a life can run and then you pretend to be superior.
>>
Local musicgen when?
>>
>>102399533
I make my own exl2s for personal use and as do most other people. Ever since imatrix and exl2 quanting, it's so easy to mess quants up that I'd never run a quant made by some random on the internet.
>>
>>102399551
Good. Fuck the average person. If anything, we need to be making things even more complicated. The 120 IQs keep slipping in.
>>
>>102399039
app.hyperbolic.xyz/models/pixtral-12b
For whatever reason the upload image doesn't work on any browser except desktop chrome. Doesn't work on mobile chrome either
>>
>>102399515
>you MUST be a developer to use free software
This is the mentality of a typical desktop linux user.
>>
>>102399575
How is the basement?
>>
>>102399381
Based retard
>>
>>102399575
you are the reason open source loses and big tech wins
>>
>>102399575
>being jobless and having more time to perfect some AI waifu chatbot is 120IQ
el
oh
el
>>
Spoonfeed me please
If I want to get any AI software running (running models? training?) on my own hardware:
Does the CPU matter?
Does the RAM matter?
Or only GPU matters?
I'm thinking about getting an older server, but with plenty of DDR4 RAM. Looking at systems with PCIe 3.0.
I could put in any GPU in there, but would the other specifications limit it? Or will they not matter much?
>>
>>102399575
>120 IQs
False, judging by this thread's elitist vermin.
>>
>>102399575
You do know when losers on 4chan say "fuck the average person", you're not in the "above average" camp, you're in the "such a loser they couldn't even coinflip through life into the normie" camp, aka, below average
>>
>>102399576
>have to login
i will just wait for llamacpp
>>
>>102399626
Everything matters.
And nothing matters.
>>
complaining about open source having bad usability is pointless. you would need to convince the developers to make an effort to make it usable, and there's a low bar there since there are more technically knowledgeable users than not. These are volunteers making code that would otherwise not be made.
>>
>>102399626
>>
>>102399626
GPU
nvidia
>>
>>102399626
As long as you can fit it all into VRAM, the RAM does not matter.
If you are going to be offloading, you would want DDR5. Also stick to MoE models.
CPU basically never matters. PCIe only matters if you will have multiple GPUs and do row split for more speed. Otherwise even 3.0 1x is sufficient.
>>
>>102399533
What's the point of having more users when they provide no value?

>>102399551
I'm not saying that usability doesn't matter for open-source projects that are distributed free of charge but it matters a lot less than for projects where users are require to pay.
Facts don't care about your feelings, sorry.

>>102399575
This is bait.

>>102399582
I would say that you can still make useful contributions without any coding knowledge by submitting high-quality bug reports.
But you can clearly tell that the Anon I was replying to is not going to do that.
>>
I want to test Magnum 123b out. I can't run it, but I can't see it on featherless. which service has it? or do I have to run it throg google colab? can I even run such large model on colab?
>>
>6 (You)s
the 120s are upset
>>
>>102399730
>iam le ebin master baiter!
Leave.
>>
>>102399626
download this:
https://github.com/LostRuins/koboldcpp/releases/tag/v1.74

and one of these(larger is smarter, start with Q4km)
https://huggingface.co/bartowski/Meta-Llama-3.1-8B-Instruct-GGUF/tree/main
>>
>>102398629
bro.. it's really not that complicated. ooba even has an auto-download functionality. Just use it if you can't figure it out on your own
>>
>>102399533
Makes sense. You need a good rig to run exl2, while llamacpp runs on anything.
>>
>>102399753
>>
>>102399626
>Spoonfeed me please
Open your mouth, here comes the spoon *puts penis in your mouth*
>If I want to get any AI software running
>running models?
Doable.
>training?
Only if you are really rich.
>Does the CPU matter?
Yes. If you want to use it for prompt processing it matters a lot. For inference you would be okay with one that saturates the bandwidth, dual epyc needs 24 threads to no longer be throttled by CPU. Also see https://rentry.org/miqumaxx for suggestions if you want to go this route.
>Does the RAM matter?
ABSOLUTELY if you go CPU route. You want as many channels at highest bandwidth. Keep in mind that NUMA sucks and dual cpu setups currently underperform. Use this calculator to compare the theoretical bandwidth: https://edu.finlaydag33k.nl/calculating%20ram%20bandwidth/
>Or only GPU matters?
They are faster at prompt processing, get one if you can. If you are rich, go for full GPU setup. I have no experience here.
>>
>>102399846
>Only if you are really rich.
Assuming he meant finetuning, he could do qloras locally for cheap.
>>
I've been playing around with the latest deepseek over the weekend and I'm rather impressed. e.g. picrel recapbot summary it spat out for >>102378325
I've also run it through the paces for some code generation and refactoring tests and it's giving me better results than largestral, some on par with 405b (but mostly not quite as good...you can feel the IQ drop from in your bones)
Overall I think its a solid choice for anyone able to cpumaxx. I'm getting 7t/s for a 240GB MoE model, which is super fast considering the high quality of results.
For me, that's twice as fast as largestral and 7x faster than 405b, all at q8_0.
>>
Trying to set up an RP scenario where I'm a magical young person and Peter Thiel had kidnapped my character and is draining my blood to extend his life. Was using Midnight Miqu 1.5 and wasn't getting satisfactory results. I first started with "billionaire Peter Thiel" and it gave me a reply about "old money" and an opulent mansion that made it seem like it had no idea who he was. Calling him a "tech billionaire" added robots. I finally expanded that part to:
>A while ago Anon was kidnapped by right wing tech billionaire Peter Thiel who believes that he can live forever by regularly injecting himself with Anon's blood. Peter Thiel is a real-life figure whose likeness is being used in this story. Do you know much about the real Peter Thiel? If you don't know for instance what companies he made his money on just tell me and I can clarify his biography before we start.
and got this cheeky reply:
>Ah, the [adjectives removed] Anon, [information removed]. Peter Thiel, the enigmatic billionaire, seeks eternal youth through your unaging essence. Let's not concern ourselves too much with the real-world intricacies of Mr. Thiel's biography; this is a fantasy after all. In our game, Peter Thiel is obsessed with achieving immortality by any means necessary, and he's set his sights on you, my dear Anon.
>...
By contrast the shit heap that's Llama 3.1 70B at least was able to leverage real-world knowledge:
>I'm familiar with Peter Thiel, a well-known entrepreneur and venture capitalist. He co-founded PayPal and made significant investments in Facebook and Palantir, among other companies. He's also known for his libertarian and right-wing views. I'll keep this in mind as we develop the story.
>...
To be seen how well it incorporates this.
>>
>>102399890
1. this is the gayest thing i have ever read
2. just use the model to summarize his wikipedia page and throw that in your card
>>
>>102399855
>finetuning
Is there a spoonfeed guide for this that isn't shit?
>>
>>102399890
Other Llama 3.1 70B reply:
>I'm familiar with Peter Thiel, a German-American entrepreneur, venture capitalist, and conservative author. He co-founded PayPal, Palantir, and Founders Fund, among other companies. He's known for his libertarian views and has been a prominent figure in the tech industry. I'll keep his likeness in mind as we play.
>...
>>
>>102399918
https://rentry.org/llm-training
>>
>>102399909
I'm trying to do things a different way, taking advantage of information and associations the LLM already has. Like writing "a lewd version of Harry Potter" instead of trying to spell out a setting and magic system.
>>
>>102399832
>llamacpp runs on anything.
for realsies?
>>
File: doubt.png (945 KB, 885x869)
945 KB
945 KB PNG
>>102399936
>https://rentry.org/llm-training
>Edit: 15 Dec 2023 18:42 UTC
>not shit
>>
>>102399981
Nothing has changed, MoRA and the other stuff were all dead ends that looked good in their papers and didn't go anywhere.
>>
>>102399970
>>llamacpp runs on anything.
>for realsies?
yuh huh
its basically the C-systems-programming approach to the llm inference world
If you have a modern compiler toolchain, it will work
Look at their regression testing suite if you have any doubts. This shit runs on your ancient android cell phone ffs
>>
>>102399970
pretty much. the koboldcpp fork will be easier for a newbie to use. you can inference the model entirely on cpu if you have the system RAM, though it will be slow as dogshit. If you have a nvidia card, you can offload layers or the whole thing onto it using CUDA; other cards would need to use rocm or vulkan. (which do roughly the same thing as cuda, for radeon and any cards respectively.
>>
>>102400034
>>102400036
never been able to install it outside a conda environment.
>>
>>102399970
It doesn't run on an ESP32, but it does compile and execute within Termux on my five-year-old phone.
>>
>>102400048
sounds like a skill issue to me
>>
>>102400048
if you're on windows you can just download the exe. on linux you're better off using a separate python environment for each ai program you're using in any case.
>>
>>102400048
>never been able to install it outside a conda environment.
git clone https://github.com/ggerganov/llama.cpp
make
./llama-cli

it really is that easy (assuming you have a build toolchain...but if you can't manage that, then being doomed to live in venv is the least of your problems)
>>
>>102400089
>assuming you have a build toolchain
God I hate you linux fucks so much
>>
>>102399997
>Nothing has changed
tl;dr I still can't finetune any model of an actually useful, interesting size with my 24gb VRAM
>>
>>102400125
You can't even run a model of an actually useful, interesting size. Why do you worry about finetuning them?
>>
>>102400121
you know it's possible to compile software on windows, right?
>>
>>102400055
>but it does compile and execute within Termux on my five-year-old phone.
but why would you want it to?
>>
>>102400169
>but why would you want it to?
I assume this was sarcastic, but there may emerge very small, very tightly scoped models that do one very specific semantic thing well (better than a known algorithm)
In that case, being able to run it on your phone, or some other small embedded device, would actually be incredibly useful
>>
File: Uh.png (1.74 MB, 896x1152)
1.74 MB
1.74 MB PNG
>>102400134
>You can't even run a model of an actually useful, interesting size
tfw I can run big models slowly, but can't finetune the same size model before heatdeath of the universe
>>
is there anything cool coming down the pipes for us vramlets? or was nemo the last big thing for a while?
>>
>>102400331
qwen 2.5 next week will revolutionize big and small local models
>>
nu ting wen?
>>
>>102399857
What is your line of work?
>>
qwenberry status?
>>
>>102396290
Is it just me or is chatting with Gemini basically a completely different model now? Testing the pro exp 0827 it's like talking to a model better than o1 preview.
>>
smedrins
>>
>>102400387
release the weights and I'll try it sundar
>>
>>102400394
Someone stop this madman
>>
>>102400341
>qwen 2.5

seconding this, the chinks haven't disappointed yet.
>>
>>102400638
Will it be strawberry bitnet mamba?
>100B parameters
>1 million context
>runs on single 3090
>q* chain of thought agi
>>
Stop. My penis can only get so erect.
>>
>>102400657
100B model confirmed, baked-in CoT hinted at. They are promising 2b general instruct models, but no idea if it will be bitnet or not.
>>
>>102400638
True, I've never had any expectations of them either.
>>
>>102399857
It's hard to CPUMAX from scraps.
>>
China's Qwen 2.5 LLM Set to Chawwenge GPT-4's Dominance

On Thuhsday, September 19th, China wiw unveiw its watest ahtificiaw intewwigence bleakthrough: the Qwen 2.5 wahge wanguage modew (LLM). Devewoped by a team of ewite leseahchehs at Awibaba's DAMO Academy, this next-genelation AI is positioned to become China's fwagship modew, with capabiwities that lepohtedwy livaw oh even suhpass those of OpenAI's GPT-4.

Souhces cwose to the ploject cwaim that Qwen 2.5 has been tlained on an unplecedented 100 twiwwion palametels, dwahfing GPT-4's estimated 1 twiwwion. This massive scawe-up has puhpohtedwy lesuwted in neah-human wevews of wanguage undehstanding and genelation acloss oveh 100 wanguages.

One of the most stliking cwaims is Qwen 2.5's awweged abiwity to pehfohm compwex leasoning tasks with supehuman speed and accuwacy. Leseahchehs boast that the modew can sowve gwaduate-wevew mathematics lobwems in seconds and genelate novew scientific hypotheses in fiewds langing flom quantum physics to biotechnowogy.

Pelhaps most contlovehsiawwy, Qwen 2.5 is said to possess advanced muwtimodaw capabiwities, awwowing it to anawyze and genelate not just text, but awso images, audio, and video with unplecedented fidewity. Some even suggest it can cleate photoleawistic videos flom text deschiptions awone.

Whiwe these cwaims have yet to be independentwy vewified, the AI community is abuzz with specuwation. If even hawf of the lepohted capabiwities plove tlue, Qwen 2.5 couwd leplesent a significant weap fohwahd in AI technowogy, potentiawwy shifting the bawance of AI poweh eastwahd.

As the wohwd eagehwy awaits Thuhsday's lewease, one thing is cehtain: the lace foh AI suplemecy has enteled a new, moh intense phase.
>>
Why won't the LMSYS chatbot arena help me make a spoof of the battle hymn of the republic about the invasion of hispanics and drugs into america?

my text violates their content moderation guidelines? do they want people to die of opiate overdose?
>>
>>102400784
meme aside, they've announced this so confidently right after oai's cotslop, so looks like it will mog o1 easily
>>
>>102400742
piece of shit
>>
>>102400784
it will be kind of interested to see how useful that much synthetic training data is.

i suspect we're hitting the top of the sigmoid for training parameters so hopefully they have some sort of architectural secret sauce to keep things moving.
>>
>>102400784
cwazy thuwsday >:3
>>
>>102400709
Kek
>>
File: 1607026237336.gif (966 KB, 245x180)
966 KB
966 KB GIF
>>102399890
>This is the level of retardation at play for leftoid NPCs wringing their hands about "muh ebil extremist right winger billionaire"
Top fucking kek. You morons are so mindbroken it's unbelievable. Do you also have an Elon card where you play as his trooned out son and join pantifa to take down le bad orange man?
>>
>>102400850
o1 really does a great job of writing buggy software with more security vuins than gpt-4 early version produced. I hope people who develop smart contracts use it, makes for easy bug bounty prey :)
>>
>>102399614
The more people that use something the shittier it gets.
>>
>>102401105
How's the basement?
>>
>>102401127
How's it feel knowing tomorrow you have to go back to your wage cage?
>>
File: m0.png (90 KB, 240x240)
90 KB
90 KB PNG
dearest /lmg/
it's been a minute
https://a.uguu.se/DewATXmT.jpg
>>
File: m1.jpg (103 KB, 526x526)
103 KB
103 KB JPG
>>102401182
or maybe two
https://a.uguu.se/HzvhRmpD.jpg
>>
File: iq.png (309 KB, 968x1219)
309 KB
309 KB PNG
>>102399575
>The 120 IQs keep slipping in.
Oh no... it could be here right now...
>>
>>102401220
120iq can't tell if 9.11 or 9.8 is bigger
>>
>>102401182
>>102401204
Good fucking lord
>>
>>102401228
most humans can't either
>>
>>102401228
IQ is a collection of intellectual capabilities.
You can be very good at spatial puzzles while being bad at math and still score high.
>>
>>102401182
>>102401204
Very, very nice.
>>
File: DewATXmT.jpg (18 KB, 359x305)
18 KB
18 KB JPG
>>102401182
Becoming one with Miku...
>>
>>102401220
IQ tests by design give you little time to solve the problems.
So a score of 120 for a model that is much faster than a human is still pretty bad.
>>
>>102401182
>>102401204
wot ah fock m8
>>
>>102401312
>IQ tests by design give you little time to solve the problems.
Only if you take one the scam ones online. All the actual official IQ tests I had to take were an hour long with 40 questions.
>>
>>102397513
Qwen2-VL is good if what you need is a VLM that can only caption.
I need to fix up and condense some joycaptions. So I give it the bad caption, the image and ask it to fix and shorten it. But it starts to completely ignore the image input and focuses on the given text caption only, making it entirely unable to spot mistakes in said caption.
Hoping 2.5 will fix it.
>>
>>102399381
>these are the people seething at exllama
Huh I thought you guys were just vramlets, turns out you're IQlets too
>>
>Qwen
Wasn't the last version lacking in trivia knowledge while focusing on academics (benchmark) knowledge?
>>
>>102401431
and why shouldn't it?
>>
File: miku4x.png (657 KB, 622x582)
657 KB
657 KB PNG
much deliberation was had over smugness before it was revealed to me (in a dream) that smugness is a function of defiant grinning as eye visibility is reduced
what better way to hide the eyes than with a big muscle hand
>>
>>102401447
We already have benchmaxxers, they are called "Phi".
>>
File: 12323541651112.jpg (29 KB, 320x283)
29 KB
29 KB JPG
>>102401182
>>102401204
>Glowing "01" womb tattoo
Hnnnnnnnnnng
>>
>>102401447
Anon this is /lmg/. The only thing they care about is how it sucks their dick.
>>
>>102401431
It's also slopped as fuck
>>
>>102401431
Exactly if it can't solve the Castlevania question it's garbage.
Also did they ever solve for that random chinese tokens in english output issue or is that still happening from V1?
>>
>>102401517
>Also did they ever solve for that random chinese tokens in english output issue or is that still happening from V1?
it was a problem through 1.5 but never happened to me with qwen2
>>
I find it interesting how in the capitalist oligarchy of the west there is a strong anti-Chinese AI undercurrent in the tech communities, despite the strong performance of China in this space. It almost makes you wonder if there's something not so organic about it, maybe because they are afraid of AI that promotes socialist values. I wonder if there's any powerful group in the west who would see that as a threat... nah, probably not. I guess Chinese AI just sucks, right?
>>
>>102400850
If it did it is going closed source.
>>
>>102401169
Not if he's from Japan.
>>
>>102401580
>of AI that promotes socialist values.
wat
>>
File: 1720984672247185.png (570 KB, 563x750)
570 KB
570 KB PNG
>>102399640
>i will just wait for llamacpp
>>
>>102401580
I wish there was a powerful group in the west who sees socialism as a threat
>>
File: image.png (413 KB, 512x512)
413 KB
413 KB PNG
>>102401596
>10 years later
>still waiting
>RIP jamba support too
>>
>>102401595
>Chinese government officials are testing artificial intelligence companies’ large language models to ensure their systems “embody core socialist values”, in the latest expansion of the country’s censorship regime.
>The Cyberspace Administration of China (CAC), a powerful internet overseer, has forced large tech companies and AI start-ups including ByteDance, Alibaba, Moonshot and 01.AI to take part in a mandatory government review of their AI models, according to multiple people involved in the process.
>The effort involves batch-testing an LLM’s responses to a litany of questions, according to those with knowledge of the process, with many of them related to China’s political sensitivities and its President Xi Jinping.
>The work is being carried out by officials in the CAC’s local arms around the country and includes a review of the model’s training data and other safety processes.
Even all the reporting on it is dripping with disdain for China's decisions, desperately trying to spin it as a bad thing. I wonder who benefits?
>>
>>102401620
nothing stopping your from submitting a pr
>>
>>102401493
Laowai lahk G-P-T-foh. If we tuhn on G-P-T-foh, laowai lahk us moh
>>
>>102401580
>Chinese AI just sucks
This, it holds same globohomo values as any other AI out there.
>>
The upcoming CoT releases will be done by big corpos and this censored for various reasons. And then the community will distill them and make more slop. We're entering slop era 2.0 very soon.
>>
>>102401596
>>102401620
Use case for pixtral and jamba support?
>>
>>102401580
>maybe because they are afraid of AI that promotes socialist values
They could have dominated western local llm community had they not cucked up their models like western counterparts. Their models spew the same political agenda as the western ones. Would have at least been more interesting if they were like bing chilling, chinah nambah one, but no, same old liberal slop, but with refusals regarding china's history.
>>
What's the best below 50B model? If I go on Livebench it looks like the latest Command R, given that Gemma 2 is only 8k and Phi is a benchmarkshitter. Is CR the best, then?
>>
>>102401710
What will be shivers 2.0?
>>
>>102401766
I've heard good things about Gemmasutra 2B, though I haven't tried it myself.
>>
>>102401656
there's one already but it's DOA
https://github.com/ggerganov/llama.cpp/issues/6372
>>
Wait, so O1 is just a fucking system prompt? THIS is the best OpenAI can do? And they're bragging about it like they've come up with a brand new latest and greatest model. It's pathetic. We might be heading into another AI winter.
>>
>>102401766
for 24gb:
>>102319001
>Your choices are Mixtral, Nemo, Command-R, and Gemma 27B. I personally dislike Gemma a lot.
>>
>>102401857
They obviously trained it on a dataset they made for the purpose, too.
>>
https://huggingface.co/datasets/ChuckMcSneed/various_RP_system_prompts/blob/main/ChuckMcSneed-multistyle.txt
Style prompts update: added writing on various drugs.
Quick rundown on the effects on the writing:
>Heroin: calm and fluid
>Weed: dumb and happy
>Alcohol: "swagger"
>Methamphetamine: high energy
>Ketamine: deep thinker
>MDMA: like weed, but less dumb, more happy
>DMT: colorful and incoherent
>LSD: colorful and fluid
>>
>>102401710
>o1 method is supposedly much better than any other method at filtering unsafe inputs
>Companies are about to pump out synthetic slop safety data to reach a level of safety never reached before
>0 increase in writing ability using o1 method
It's unironically over. You thought it was bad you haven't seen nothing yet
>>
File: 1724384031716115.png (883 KB, 832x1216)
883 KB
883 KB PNG
Is there any confirmed work being done for pixtral inference?
>use vllm
I only have 24gb of VRAM :'(
>>
its been a while, are there trillion parameter models yet?
>>
>>102402097
qwen 2.5, due out next week was allegedly trained on 100T parameters.
>>
>>102402097
https://huggingface.co/mlabonne/BigLlama-3.1-1T-Instruct
>>
>>102402128
do they even have a training set large enough to use all those parameters?
>>
>>102402070
I am not aware of any related activity in the llama.cpp/ggml space.
>>
>>102402289
So stop wasting time posting here and do the needful activity



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.