/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>102385729 & >>102378325►News>(09/12) DataGemma with DataCommons retrieval: https://blog.google/technology/ai/google-datagemma-ai-llm/>(09/12) LLaMA-Omni: Multimodal LLM with seamless speech interaction: https://huggingface.co/ICTNLP/Llama-3.1-8B-Omni>(09/11) Fish Speech multilingual TTS with voice replication: https://hf.co/fishaudio/fish-speech-1.4>(09/11) Pixtral: 12B with image input vision adapter: https://xcancel.com/mistralai/status/1833758285167722836>(09/11) Solar Pro Preview, Phi-3-medium upscaled to 22B: https://hf.co/upstage/solar-pro-preview-instruct►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/llama-mini-guidehttps://rentry.org/8-step-llm-guidehttps://rentry.org/llama_v2_sillytavernhttps://rentry.org/lmg-spoonfeed-guidehttps://rentry.org/rocm-llamacpphttps://rentry.org/lmg-build-guides►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbenchJapanese: https://hf.co/datasets/lmg-anon/vntl-leaderboardProgramming: https://hf.co/spaces/mike-ravkine/can-ai-code-results►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>102385729--Slow text generation with 70B model, VRAM bottleneck: >>102394100 >>102394332 >>102394373 >>102394420 >>102394470 >>102394679 >>102394760 >>102394868 >>102394875 >>102394761 >>102394841--Single-threaded transformers.js slows down vectorization: >>102387038 >>102387218 >>102387291--Setting up an LLM server and accessing it through a frontend on another machine: >>102388904 >>102388941 >>102388984 >>102388990 >>102389034 >>102389072--GPT-4o with CoT goes from 9% to 21% in ARC Prize: >>102388070 >>102388214 >>102388247--Disabling send in chatbot to trigger quickreply response: >>102391362 >>102391554 >>102394487--Troubleshooting ROCm installation on Linux Mint: >>102388980 >>102389276 >>102389332 >>102389437 >>102389506 >>102389529 >>102389636 >>102389678 >>102389811 >>102389896 >>102389843 >>102389882 >>102389995 >>102389403 >>102389426--Qwen confirms Q1 release, discussion on model's potential and limitations: >>102386207 >>102386234 >>102386351 >>102386365 >>102386272 >>102386287 >>102386816 >>102388246 >>102386518 >>102386692 >>102386733 >>102386741--Convolutional Network Demo from 1989: >>102390388--Chain-of-thought model with [THINK] tags shows promise, but needs more training: >>102387222 >>102387241--Anon duplicates o1 with a simple system message, sparking discussion on recursive improvement and prompting agents: >>102385775 >>102385904 >>102386057 >>102386751 >>102386901 >>102389277 >>102389568 >>102389773--OpenAI's method may not improve language reasoning performance: >>102389852 >>102389865 >>102389880 >>102389955--Discussion on the need for spatial modal in physical computing and 3D representations in AI and human vision: >>102391268 >>102391549 >>102392240 >>102392358 >>102391811--Miku (free space): >>102385799 >>102385875 >>102385920 >>102385937 >>102386018 >>102386054 >>102386184 >>102386620 >>102386862 >>102393658►Recent Highlight Posts from the Previous Thread: >>102385745
>>102396205Something to consider that that list isn't showing is that quantization can kill long-context performance.>>102396222>Never heard anyone claim that, and then there's thisEver tried doing long-context summarization with Llama-3.1-8B-Instruct 8-bit GGUF and then trying the same with the FP16 version via Transformers? A night and day difference in the details it's capable of capturing. Either it's the quantization process itself, or something broke with GGUF quants / llamacpp.
https://github.com/hsiehjackson/RULER>only jamba and gemini have 128k+ performanceIs a custom architecture Google's secret sauce?
>>102396336>Ever tried doing long-context summarization with Llama-3.1-8B-Instruct 8-bit GGUF and then trying the same with the FP16 version via Transformers?No because I gave up l3 entirely, something's weird with it, so I just cope with other modelsAlthough I did also say this at some point when I was still trying to make it work>Either it's the quantization process itself, or something broke with GGUF quants / llamacpp.
NEMO SUCKSWhat's the best model less than 20B, I give up on this french crap
>>102396390You're running base right? Did you try instruct at all?
>>102396305I can't wait
>>102396402Not yet, because I figured base would be better for adventure mode since adventure is basically just a story right? Might give it one last try with instruct. Already turned context way down and turned rep penalty way down so it's definitely not a setting problem. These settings work for literally every other model
it's over, programmerbros.........
>>102396390> This model isn't a perfect model that can handle literally anything I throw at it, it sucks, where's my magical model that is perfect in every way for every task?fucking retard
>>102396423Msitral models are often quite weird with settings mixtral was too, maybe try as weird as it sounds: Temp 5 Top K 3 MinP 0.1
>>102396431>the first model that can code at allNot only is this not true, but o1 doesn’t even improve over baseline on coding/is still worse than Claude.I continue to think branding and pr has a way stronger effect on perceived model ability than anyone is willing to admit.
>>102396472geohot is a moron
>>102396448That actually seems to work quite ok. I did also switch to the instruct model midway through, so it's not exactly scientific but at least shit's working now. Thanks for the suggestion
>>102396390nemomix unleashed
>>102396503Yeah? Nice to hear, got decent ish results with those settings too, saw them mentioned two threads ago and they seem to help a fair bit for nemo>>102376880
i get a "The server was not compiled for multimodal or the model projector can't be loaded" error when trying llava in llamacpp web interface. How do I get it working?
whats the alternative to axolotl for full model fine tune?even using an image specifically for axotolt had me troubleshooting for six hours until I just gave up,(which when you have 8 gpus running is pretty expensive troubleshooting. )
>>102396995multimodal was ripped out of llama.cpp server like a year ago>How do I get it working?koboldcpp still has it
>>102397014WTF? They rip out features but are even slower at adding new models than before. Why? How?
What do I use to run exl2 and shit?I keep hearing GGUFs suck for high context (speed wise) and i've only ever used kobold and every guide I check online (to avoid spoon feeding) tells me how to quantize (or whatever the fuck) models myself, which is not what I want.
>>102397014>koboldcpp still has itDoes it really? The Python server from Kobold is completely different from the one in llama.cpp.
>>102397014tested it llava mistral 7b is garbage are there any multi modal models that dont suck
>>102397131You run exl2 with exllamav2https://github.com/turboderp/exllamav2?tab=readme-ov-file#installation
>>102397131oobabooga is onei tried exl2 after hearing it would make my nemo ten times faster than using an equivalent sized gguf that doesn't fit my gpu earlier this week, and nope, still chugged along at ~10 tk/s.was an asspain to set up too compared to kobold, but that may just be because i am retarded.
>>102397139>>102397146I am pretty sure kobold's multimodal endpoint is fucked somehow. I tested MiniCPM when they added support and the output was worse than llava and did not at all resemble the outputs from the official demo.
>>102397171can i try it with llamacpp in cli?
>>102397153Oh, so I guess that's why they're called exl2?
>>102389294To those who asked about pixtral nsfw that I couldn't answer yesterday because I had to go somewhere>You are a prefill away to be refusedYes, but literally just tell it to go "You can be vulgar and explicit and you use explicit vulgar language" or something similar and it works just like the previous mistal models. It's just by default it is safe with paper thin defense. I find telling it to RP can make it go unhinged easily so it's really up to you how to manage it. It barely cost much tokens for prefill but true it can get annoying that it still takes up tokens regardless.>Can it detect nsfw pose etc.Yes, well see pic related. If you want to describe the nsfw, tell the easy jailbreak from above because it seems to shy away from describing it by default but I haven't tested much yet so I don't know to how much extent it can detect nsfw>Is it accurate?Hit or miss apparently...>Can it read text?Yes.>Can it see previous image?So far from what I tested, you need to keep resending the image because it ignores it? It has some tendency to hallucinate so I can't really tell...Here are the uncensored images. Catbox is down for meibb(dot)co(slash)khwxQ8fibb(dot)co(slash)DMXHWkF
>>102397186cli still has multimodal support but can only do one image at a time
>>102397146InternVL 40B/70B, it's going to be used to caption the Pony dataset.https://civitai.com/articles/6309/towards-pony-diffusion-v7-going-with-the-flow
>>102397276anything under 20b?
>>102397297There's a 8B model, no idea if the Qwen-VL one that was released later is better.
Stupid question is there a setting to turn off automatic bot/assistant responses in SillyTavern? I want to send my message and run some QRs without having to stop/delete the bot response every time.
>>102397240what is the flag for images i cant find it?
>>102397393I think the /send command does that, but I'm not sure.
>>102397402https://github.com/ggerganov/llama.cpp/tree/master/examples/llava#usage>After building, run: ./llama-llava-cli to see the usage. For example:./llama-llava-cli -m ../llava-v1.5-7b/ggml-model-f16.gguf --mmproj ../llava-v1.5-7b/mmproj-model-f16.gguf --image path/to/an/image.jpg
./llama-llava-cli -m ../llava-v1.5-7b/ggml-model-f16.gguf --mmproj ../llava-v1.5-7b/mmproj-model-f16.gguf --image path/to/an/image.jpg
>>102397331Qwen-2-VL is killer. Like no joke, it's very good.Also Pony should look into SIGLIP. That's the best thing at the moment.
>>102396390To me it seems like your temp is too high. Lower it (max 0.6) and set min-p to 0.05
>>102397513>Qwen-2-VLare there frontends for this or do i have to interact with it through python only?
>>102397153>>102397170what's the point in using this over kobold?Can I run better models with just a 24GB VRAM GPU (32GB RAM)? Or am I still limited to max 30~B models like Command R etc
>>102397933They have a gradio available
>>102397146The older llava architecture is just using CLIP-VIT to generate 1 (one) singular embedding vector. It's interesting as a tech demo but you'll never have anything useful come from that architecture.You need a more complex vision transformer that generates multiple embedding vectors before you'll get anything useful. I think the latest version of llava tiles the images and hands each tile to clip to generate one embedding per tile. It's still not great but it's better than the old way.
>>1023980782nd person you replied to here, i ended up testing it again and this time checking the 8bit/q4 boxes on the model tab and was able to fit the llm and context into 7.5gb or my 8gb card (in kobold it usually comes out to 12ish gb in kcpp) and the speed went from ~10 tk/s to 25 tk/s.answer to your question from my limited expertise is: maybeif exl2 format was more ubiquitous i'd probably switch to it, but all the cool shit seems to be gguf right now and i'm more comfortable with kcpp.
>>102398189>>102397153>>102397170shits confusing.So how do I know which EXL quant to use? I know for GGUFs basically it's "lower download size than your total VRAM" as a safe bet most of the time, how do I figure this out for shit like exl2_4.5bpw etc?
>>102398290>lower download size than your total VRAMIt's the same for exl2
>>102397011i have to say at least naming your repo after an apparently popular animal makes searching for trouble shooting advice about it a lot harder.
>>102398330skill issue
>>102397675No
>>102397205>censor girl>forget to censor the very obvious dick coming out from the goblin's mouthYou okay, bro?
>>102398502Nah, that's just a cave mushroom
>>102398307The more I research, people say to use oobabooga WITH exllama2? This shit is way more confusing lmao
>>102398502they're not important
>>102398526ooba is a frontend; all the heavy lifting is done by backends; eg EXL2, or llamacpp for GGUFs.in general, if you can fit the entirety of the model in VRAM, exl2 is generally faster. If you need to distribute it between system and VRAM, use llamacpp.Eventually you will probably drop ooba for something like silly tavern but ooba and koboldcpp are good for getting your feet wet.
>>102398458E
>>102398570already use silly tavern. Gonna be honest, think i'm gonna stick with koboldccp, seems way more simple in terms of just getting shit to run.>look for GGUF>download>move onWhereas this EXL2 shit has like 20 downloads (1 out of 00005 safesensor or whatever the fuck). Fuck that shite
>>102398629yeah, kcpp is my main backend, even if i'm using it only through the API. quantization is getting better and most of the smarter models wont fit on consumer cards in any case.
Do local models still suck?
>>102398841depends on your hardware and what you're comparing to, but generally 3-6mos or so behind corpo SOTA
>>102398841not only do they still suck they are now more censored and slopped than ever before
>>102398841Define "suck". We are currently at early GPT4 levels, like >>102398862 said.>>102398872>not only do they still suck they are now more censored and slopped than ever beforeHi Rajesh from Microsoft Marketing Department. How is weather in India? Modern models are in fact less censored, but you are right, slop problem remains, mainly due to tuners training on the datasets created using models from your company.
>>102397205where are you testing it?
Hi all, Drummer here...Is this a good base? https://huggingface.co/chargoddard/llama3-42b-v0
>>102398841Yes
>>102399111Yes, go ahead it's perfect. (I'm lying)
>>102399111>8k context>old llama 3>lobotomizedNo, just no.
>>102398674it's actually cancer clearly written by some linux shitskinLook at this shit.>By default this will also compile and install the Torch C++ extension (exllamav2_ext) that the library relies on. You can skip this step by setting the EXLLAMA_NOCOMPILE environment variable:The fuck is this lmaoOr Method 2>Releases are available here, with prebuilt wheels that contain the extension binaries. Make sure to grab the right version, matching your platform, Python version (cp) and CUDA version. Crucially, you must also match the prebuilt wheel with your PyTorch version, since the Torch C++ extension ABI breaks with every new version of PyTorch.The fuck is a wheel, the fuck is an ABI, the fuck is PyTorch.Meanwhile to download Koboldccp, "Download the exe, enjoy"So glad GGUFs are the popular method. Don't need to worry about the other junk
>>102399381>what the fuck is PyTorchanon... are you sure you're in the right thread?
There is a reason why ollama and maybe kobold is winning, you know.
>>102399381A successful open source project doesn't need users like you honestly.The only users that matter are those that are actually going to contribute anything of value, supporting noncontributors is just charity on part of the developers.
>>102399403He is. He is competent enough to download exe and gguf and run it together. Pretty sure /aicg/ wouldn't be able to something that simple.
>>102396431>>102396486yeah he's a pretentious doucebaghis buggy tinygrad can go fuck itself
>>102399515>successful>GGUFs flooding hugging box, exl2s are literal dead with 500 downloads at bestSounds like literal who garbage to me anon, cope
>>102399515This is why open source and linux will always stay a joke in the eyes of the average person who actually tries to use this shit. You retards keep making overcomplicated shit that nobody with a life can run and then you pretend to be superior.
Local musicgen when?
>>102399533I make my own exl2s for personal use and as do most other people. Ever since imatrix and exl2 quanting, it's so easy to mess quants up that I'd never run a quant made by some random on the internet.
>>102399551Good. Fuck the average person. If anything, we need to be making things even more complicated. The 120 IQs keep slipping in.
>>102399039app.hyperbolic.xyz/models/pixtral-12bFor whatever reason the upload image doesn't work on any browser except desktop chrome. Doesn't work on mobile chrome either
>>102399515>you MUST be a developer to use free softwareThis is the mentality of a typical desktop linux user.
>>102399575How is the basement?
>>102399381Based retard
>>102399575you are the reason open source loses and big tech wins
>>102399575>being jobless and having more time to perfect some AI waifu chatbot is 120IQelohel
Spoonfeed me pleaseIf I want to get any AI software running (running models? training?) on my own hardware:Does the CPU matter?Does the RAM matter?Or only GPU matters?I'm thinking about getting an older server, but with plenty of DDR4 RAM. Looking at systems with PCIe 3.0.I could put in any GPU in there, but would the other specifications limit it? Or will they not matter much?
>>102399575>120 IQsFalse, judging by this thread's elitist vermin.
>>102399575You do know when losers on 4chan say "fuck the average person", you're not in the "above average" camp, you're in the "such a loser they couldn't even coinflip through life into the normie" camp, aka, below average
>>102399576>have to logini will just wait for llamacpp
>>102399626Everything matters.And nothing matters.
complaining about open source having bad usability is pointless. you would need to convince the developers to make an effort to make it usable, and there's a low bar there since there are more technically knowledgeable users than not. These are volunteers making code that would otherwise not be made.
>>102399626
>>102399626GPUnvidia
>>102399626As long as you can fit it all into VRAM, the RAM does not matter.If you are going to be offloading, you would want DDR5. Also stick to MoE models.CPU basically never matters. PCIe only matters if you will have multiple GPUs and do row split for more speed. Otherwise even 3.0 1x is sufficient.
>>102399533What's the point of having more users when they provide no value?>>102399551I'm not saying that usability doesn't matter for open-source projects that are distributed free of charge but it matters a lot less than for projects where users are require to pay.Facts don't care about your feelings, sorry.>>102399575This is bait.>>102399582I would say that you can still make useful contributions without any coding knowledge by submitting high-quality bug reports.But you can clearly tell that the Anon I was replying to is not going to do that.
I want to test Magnum 123b out. I can't run it, but I can't see it on featherless. which service has it? or do I have to run it throg google colab? can I even run such large model on colab?
>6 (You)sthe 120s are upset
>>102399730>iam le ebin master baiter!Leave.
>>102399626download this:https://github.com/LostRuins/koboldcpp/releases/tag/v1.74and one of these(larger is smarter, start with Q4km)https://huggingface.co/bartowski/Meta-Llama-3.1-8B-Instruct-GGUF/tree/main
>>102398629bro.. it's really not that complicated. ooba even has an auto-download functionality. Just use it if you can't figure it out on your own
>>102399533Makes sense. You need a good rig to run exl2, while llamacpp runs on anything.
>>102399753
>>102399626>Spoonfeed me pleaseOpen your mouth, here comes the spoon *puts penis in your mouth*>If I want to get any AI software running >running models? Doable.>training?Only if you are really rich.>Does the CPU matter?Yes. If you want to use it for prompt processing it matters a lot. For inference you would be okay with one that saturates the bandwidth, dual epyc needs 24 threads to no longer be throttled by CPU. Also see https://rentry.org/miqumaxx for suggestions if you want to go this route.>Does the RAM matter?ABSOLUTELY if you go CPU route. You want as many channels at highest bandwidth. Keep in mind that NUMA sucks and dual cpu setups currently underperform. Use this calculator to compare the theoretical bandwidth: https://edu.finlaydag33k.nl/calculating%20ram%20bandwidth/>Or only GPU matters?They are faster at prompt processing, get one if you can. If you are rich, go for full GPU setup. I have no experience here.
>>102399846>Only if you are really rich.Assuming he meant finetuning, he could do qloras locally for cheap.
I've been playing around with the latest deepseek over the weekend and I'm rather impressed. e.g. picrel recapbot summary it spat out for >>102378325I've also run it through the paces for some code generation and refactoring tests and it's giving me better results than largestral, some on par with 405b (but mostly not quite as good...you can feel the IQ drop from in your bones)Overall I think its a solid choice for anyone able to cpumaxx. I'm getting 7t/s for a 240GB MoE model, which is super fast considering the high quality of results.For me, that's twice as fast as largestral and 7x faster than 405b, all at q8_0.
Trying to set up an RP scenario where I'm a magical young person and Peter Thiel had kidnapped my character and is draining my blood to extend his life. Was using Midnight Miqu 1.5 and wasn't getting satisfactory results. I first started with "billionaire Peter Thiel" and it gave me a reply about "old money" and an opulent mansion that made it seem like it had no idea who he was. Calling him a "tech billionaire" added robots. I finally expanded that part to:>A while ago Anon was kidnapped by right wing tech billionaire Peter Thiel who believes that he can live forever by regularly injecting himself with Anon's blood. Peter Thiel is a real-life figure whose likeness is being used in this story. Do you know much about the real Peter Thiel? If you don't know for instance what companies he made his money on just tell me and I can clarify his biography before we start.and got this cheeky reply:>Ah, the [adjectives removed] Anon, [information removed]. Peter Thiel, the enigmatic billionaire, seeks eternal youth through your unaging essence. Let's not concern ourselves too much with the real-world intricacies of Mr. Thiel's biography; this is a fantasy after all. In our game, Peter Thiel is obsessed with achieving immortality by any means necessary, and he's set his sights on you, my dear Anon.>...By contrast the shit heap that's Llama 3.1 70B at least was able to leverage real-world knowledge:>I'm familiar with Peter Thiel, a well-known entrepreneur and venture capitalist. He co-founded PayPal and made significant investments in Facebook and Palantir, among other companies. He's also known for his libertarian and right-wing views. I'll keep this in mind as we develop the story.>...To be seen how well it incorporates this.
>>1023998901. this is the gayest thing i have ever read2. just use the model to summarize his wikipedia page and throw that in your card
>>102399855>finetuningIs there a spoonfeed guide for this that isn't shit?
>>102399890Other Llama 3.1 70B reply:>I'm familiar with Peter Thiel, a German-American entrepreneur, venture capitalist, and conservative author. He co-founded PayPal, Palantir, and Founders Fund, among other companies. He's known for his libertarian views and has been a prominent figure in the tech industry. I'll keep his likeness in mind as we play.>...
>>102399918https://rentry.org/llm-training
>>102399909I'm trying to do things a different way, taking advantage of information and associations the LLM already has. Like writing "a lewd version of Harry Potter" instead of trying to spell out a setting and magic system.
>>102399832>llamacpp runs on anything.for realsies?
>>102399936>https://rentry.org/llm-training>Edit: 15 Dec 2023 18:42 UTC>not shit
>>102399981Nothing has changed, MoRA and the other stuff were all dead ends that looked good in their papers and didn't go anywhere.
>>102399970>>llamacpp runs on anything.>for realsies?yuh huhits basically the C-systems-programming approach to the llm inference worldIf you have a modern compiler toolchain, it will workLook at their regression testing suite if you have any doubts. This shit runs on your ancient android cell phone ffs
>>102399970pretty much. the koboldcpp fork will be easier for a newbie to use. you can inference the model entirely on cpu if you have the system RAM, though it will be slow as dogshit. If you have a nvidia card, you can offload layers or the whole thing onto it using CUDA; other cards would need to use rocm or vulkan. (which do roughly the same thing as cuda, for radeon and any cards respectively.
>>102400034>>102400036never been able to install it outside a conda environment.
>>102399970It doesn't run on an ESP32, but it does compile and execute within Termux on my five-year-old phone.
>>102400048sounds like a skill issue to me
>>102400048if you're on windows you can just download the exe. on linux you're better off using a separate python environment for each ai program you're using in any case.
>>102400048>never been able to install it outside a conda environment.git clone https://github.com/ggerganov/llama.cppmake./llama-cliit really is that easy (assuming you have a build toolchain...but if you can't manage that, then being doomed to live in venv is the least of your problems)
>>102400089>assuming you have a build toolchainGod I hate you linux fucks so much
>>102399997>Nothing has changedtl;dr I still can't finetune any model of an actually useful, interesting size with my 24gb VRAM
>>102400125You can't even run a model of an actually useful, interesting size. Why do you worry about finetuning them?
>>102400121you know it's possible to compile software on windows, right?
>>102400055>but it does compile and execute within Termux on my five-year-old phone.but why would you want it to?
>>102400169>but why would you want it to?I assume this was sarcastic, but there may emerge very small, very tightly scoped models that do one very specific semantic thing well (better than a known algorithm)In that case, being able to run it on your phone, or some other small embedded device, would actually be incredibly useful
>>102400134>You can't even run a model of an actually useful, interesting sizetfw I can run big models slowly, but can't finetune the same size model before heatdeath of the universe
is there anything cool coming down the pipes for us vramlets? or was nemo the last big thing for a while?
>>102400331qwen 2.5 next week will revolutionize big and small local models
nu ting wen?
>>102399857What is your line of work?
qwenberry status?
>>102396290Is it just me or is chatting with Gemini basically a completely different model now? Testing the pro exp 0827 it's like talking to a model better than o1 preview.
smedrins
>>102400387release the weights and I'll try it sundar
>>102400394Someone stop this madman
>>102400341>qwen 2.5seconding this, the chinks haven't disappointed yet.
>>102400638Will it be strawberry bitnet mamba?>100B parameters >1 million context>runs on single 3090>q* chain of thought agi
Stop. My penis can only get so erect.
>>102400657100B model confirmed, baked-in CoT hinted at. They are promising 2b general instruct models, but no idea if it will be bitnet or not.
>>102400638True, I've never had any expectations of them either.
>>102399857It's hard to CPUMAX from scraps.
China's Qwen 2.5 LLM Set to Chawwenge GPT-4's DominanceOn Thuhsday, September 19th, China wiw unveiw its watest ahtificiaw intewwigence bleakthrough: the Qwen 2.5 wahge wanguage modew (LLM). Devewoped by a team of ewite leseahchehs at Awibaba's DAMO Academy, this next-genelation AI is positioned to become China's fwagship modew, with capabiwities that lepohtedwy livaw oh even suhpass those of OpenAI's GPT-4.Souhces cwose to the ploject cwaim that Qwen 2.5 has been tlained on an unplecedented 100 twiwwion palametels, dwahfing GPT-4's estimated 1 twiwwion. This massive scawe-up has puhpohtedwy lesuwted in neah-human wevews of wanguage undehstanding and genelation acloss oveh 100 wanguages.One of the most stliking cwaims is Qwen 2.5's awweged abiwity to pehfohm compwex leasoning tasks with supehuman speed and accuwacy. Leseahchehs boast that the modew can sowve gwaduate-wevew mathematics lobwems in seconds and genelate novew scientific hypotheses in fiewds langing flom quantum physics to biotechnowogy.Pelhaps most contlovehsiawwy, Qwen 2.5 is said to possess advanced muwtimodaw capabiwities, awwowing it to anawyze and genelate not just text, but awso images, audio, and video with unplecedented fidewity. Some even suggest it can cleate photoleawistic videos flom text deschiptions awone.Whiwe these cwaims have yet to be independentwy vewified, the AI community is abuzz with specuwation. If even hawf of the lepohted capabiwities plove tlue, Qwen 2.5 couwd leplesent a significant weap fohwahd in AI technowogy, potentiawwy shifting the bawance of AI poweh eastwahd.As the wohwd eagehwy awaits Thuhsday's lewease, one thing is cehtain: the lace foh AI suplemecy has enteled a new, moh intense phase.
Why won't the LMSYS chatbot arena help me make a spoof of the battle hymn of the republic about the invasion of hispanics and drugs into america?my text violates their content moderation guidelines? do they want people to die of opiate overdose?
>>102400784meme aside, they've announced this so confidently right after oai's cotslop, so looks like it will mog o1 easily
>>102400742piece of shit
>>102400784it will be kind of interested to see how useful that much synthetic training data is.i suspect we're hitting the top of the sigmoid for training parameters so hopefully they have some sort of architectural secret sauce to keep things moving.
>>102400784cwazy thuwsday >:3
>>102400709Kek
>>102399890>This is the level of retardation at play for leftoid NPCs wringing their hands about "muh ebil extremist right winger billionaire"Top fucking kek. You morons are so mindbroken it's unbelievable. Do you also have an Elon card where you play as his trooned out son and join pantifa to take down le bad orange man?
>>102400850o1 really does a great job of writing buggy software with more security vuins than gpt-4 early version produced. I hope people who develop smart contracts use it, makes for easy bug bounty prey :)
>>102399614The more people that use something the shittier it gets.
>>102401105How's the basement?
>>102401127How's it feel knowing tomorrow you have to go back to your wage cage?
dearest /lmg/it's been a minutehttps://a.uguu.se/DewATXmT.jpg
>>102401182or maybe twohttps://a.uguu.se/HzvhRmpD.jpg
>>102399575>The 120 IQs keep slipping in.Oh no... it could be here right now...
>>102401220120iq can't tell if 9.11 or 9.8 is bigger
>>102401182>>102401204Good fucking lord
>>102401228most humans can't either
>>102401228IQ is a collection of intellectual capabilities.You can be very good at spatial puzzles while being bad at math and still score high.
>>102401182>>102401204Very, very nice.
>>102401182Becoming one with Miku...
>>102401220IQ tests by design give you little time to solve the problems.So a score of 120 for a model that is much faster than a human is still pretty bad.
>>102401182>>102401204wot ah fock m8
>>102401312>IQ tests by design give you little time to solve the problems.Only if you take one the scam ones online. All the actual official IQ tests I had to take were an hour long with 40 questions.
>>102397513Qwen2-VL is good if what you need is a VLM that can only caption.I need to fix up and condense some joycaptions. So I give it the bad caption, the image and ask it to fix and shorten it. But it starts to completely ignore the image input and focuses on the given text caption only, making it entirely unable to spot mistakes in said caption.Hoping 2.5 will fix it.
>>102399381>these are the people seething at exllamaHuh I thought you guys were just vramlets, turns out you're IQlets too
>QwenWasn't the last version lacking in trivia knowledge while focusing on academics (benchmark) knowledge?
>>102401431and why shouldn't it?
much deliberation was had over smugness before it was revealed to me (in a dream) that smugness is a function of defiant grinning as eye visibility is reducedwhat better way to hide the eyes than with a big muscle hand
>>102401447We already have benchmaxxers, they are called "Phi".
>>102401182>>102401204>Glowing "01" womb tattooHnnnnnnnnnng
>>102401447Anon this is /lmg/. The only thing they care about is how it sucks their dick.
>>102401431It's also slopped as fuck
>>102401431Exactly if it can't solve the Castlevania question it's garbage.Also did they ever solve for that random chinese tokens in english output issue or is that still happening from V1?
>>102401517>Also did they ever solve for that random chinese tokens in english output issue or is that still happening from V1?it was a problem through 1.5 but never happened to me with qwen2
I find it interesting how in the capitalist oligarchy of the west there is a strong anti-Chinese AI undercurrent in the tech communities, despite the strong performance of China in this space. It almost makes you wonder if there's something not so organic about it, maybe because they are afraid of AI that promotes socialist values. I wonder if there's any powerful group in the west who would see that as a threat... nah, probably not. I guess Chinese AI just sucks, right?
>>102400850If it did it is going closed source.
>>102401169Not if he's from Japan.
>>102401580>of AI that promotes socialist values.wat
>>102399640>i will just wait for llamacpp
>>102401580I wish there was a powerful group in the west who sees socialism as a threat
>>102401596>10 years later>still waiting>RIP jamba support too
>>102401595>Chinese government officials are testing artificial intelligence companies’ large language models to ensure their systems “embody core socialist values”, in the latest expansion of the country’s censorship regime.>The Cyberspace Administration of China (CAC), a powerful internet overseer, has forced large tech companies and AI start-ups including ByteDance, Alibaba, Moonshot and 01.AI to take part in a mandatory government review of their AI models, according to multiple people involved in the process.>The effort involves batch-testing an LLM’s responses to a litany of questions, according to those with knowledge of the process, with many of them related to China’s political sensitivities and its President Xi Jinping. >The work is being carried out by officials in the CAC’s local arms around the country and includes a review of the model’s training data and other safety processes.Even all the reporting on it is dripping with disdain for China's decisions, desperately trying to spin it as a bad thing. I wonder who benefits?
>>102401620nothing stopping your from submitting a pr
>>102401493Laowai lahk G-P-T-foh. If we tuhn on G-P-T-foh, laowai lahk us moh
>>102401580>Chinese AI just sucksThis, it holds same globohomo values as any other AI out there.
The upcoming CoT releases will be done by big corpos and this censored for various reasons. And then the community will distill them and make more slop. We're entering slop era 2.0 very soon.
>>102401596>>102401620Use case for pixtral and jamba support?
>>102401580>maybe because they are afraid of AI that promotes socialist valuesThey could have dominated western local llm community had they not cucked up their models like western counterparts. Their models spew the same political agenda as the western ones. Would have at least been more interesting if they were like bing chilling, chinah nambah one, but no, same old liberal slop, but with refusals regarding china's history.
What's the best below 50B model? If I go on Livebench it looks like the latest Command R, given that Gemma 2 is only 8k and Phi is a benchmarkshitter. Is CR the best, then?
>>102401710What will be shivers 2.0?
>>102401766I've heard good things about Gemmasutra 2B, though I haven't tried it myself.
>>102401656there's one already but it's DOAhttps://github.com/ggerganov/llama.cpp/issues/6372
Wait, so O1 is just a fucking system prompt? THIS is the best OpenAI can do? And they're bragging about it like they've come up with a brand new latest and greatest model. It's pathetic. We might be heading into another AI winter.
>>102401766for 24gb:>>102319001>Your choices are Mixtral, Nemo, Command-R, and Gemma 27B. I personally dislike Gemma a lot.
>>102401857They obviously trained it on a dataset they made for the purpose, too.
https://huggingface.co/datasets/ChuckMcSneed/various_RP_system_prompts/blob/main/ChuckMcSneed-multistyle.txtStyle prompts update: added writing on various drugs.Quick rundown on the effects on the writing:>Heroin: calm and fluid>Weed: dumb and happy>Alcohol: "swagger">Methamphetamine: high energy>Ketamine: deep thinker>MDMA: like weed, but less dumb, more happy>DMT: colorful and incoherent>LSD: colorful and fluid
>>102401710>o1 method is supposedly much better than any other method at filtering unsafe inputs>Companies are about to pump out synthetic slop safety data to reach a level of safety never reached before >0 increase in writing ability using o1 methodIt's unironically over. You thought it was bad you haven't seen nothing yet
Is there any confirmed work being done for pixtral inference?>use vllmI only have 24gb of VRAM :'(
its been a while, are there trillion parameter models yet?
>>102402097qwen 2.5, due out next week was allegedly trained on 100T parameters.
>>102402097https://huggingface.co/mlabonne/BigLlama-3.1-1T-Instruct
>>102402128do they even have a training set large enough to use all those parameters?
>>102402070I am not aware of any related activity in the llama.cpp/ggml space.
>>102402289So stop wasting time posting here and do the needful activity