/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>101439122 & >>101431253►News>(07/16) Codestral Mamba, tested up to 256k context: https://hf.co/mistralai/mamba-codestral-7B-v0.1>(07/16) MathΣtral Instruct based on Mistral 7B: https://hf.co/mistralai/mathstral-7B-v0.1>(07/13) Llama 3 405B coming July 23rd: https://x.com/steph_palazzolo/status/1811791968600576271>(07/09) Anole, based on Chameleon, for interleaved image-text generation: https://hf.co/GAIR/Anole-7b-v0.1>(07/07) Support for glm3 and glm4 merged into llama.cpp: https://github.com/ggerganov/llama.cpp/pull/8031►News Archive: https://rentry.org/lmg-news-archive►FAQ: https://wikia.schneedc.com►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/llama-mini-guidehttps://rentry.org/8-step-llm-guidehttps://rentry.org/llama_v2_sillytavernhttps://rentry.org/lmg-spoonfeed-guidehttps://rentry.org/rocm-llamacpphttps://rentry.org/lmg-build-guides►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardProgramming: https://hf.co/spaces/bigcode/bigcode-models-leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbench►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>101439122--Paper: Q-Sparse: All Large Language Models can be Fully Sparsely-Activated: >>101439990 >>101440042 >>101440064 >>101442269 >>101440609 >>101443047 >>101443222 >>101443349 >>101440134 >>101440147--GGUF vs EXL2: llama.cpp catches up in speed and KV caching: >>101444001 >>101444083 >>101444756 >>101445121 >>101444182 >>101444302 >>101444819 >>101445612 >>101445706--Speculations on State-Space Models, LLM Integration with Low-Power Devices, and the Significance of Leveraging Computation: >>101442599 >>101442729 >>101442868 >>101445446--Seeking Open-Source Project for Local Server with OpenAI Compatible API and Multi-: >>101440611 >>101440898 >>101441409 >>101441511 >>101441574 >>101444786 >>101444898--Saving Chat History with Ollama CLI and Alternatives: >>101442061 >>101442286 >>101442443 >>101442485--RTX 2070 and LLaMA V3: Seeking Decent Results: >>101439327 >>101439368 >>101439396 >>101439429 >>101439448 >>101439468 >>101439356 >>101439556--Embodiment of Core Socialist Values and Re-education: >>101446280--AI-Generated Videos and Their Impact on Human Creativity: >>101443763 >>101443885 >>101444309 >>101444425 >>101445081 >>101447031--Pull Request for Chameleon Support in llama.cpp: Current Limitations and Future Improvements: >>101442750--MoEs: Generally Cheaper to Train but Not Always: >>101439447 >>101442516--Koboldcpp's OpenAI-Compatible API Endpoints: Not Recommended for the Normies: >>101447084 >>101447218 >>101447537--GB200 Hardware Architecture and Component Supply Chain & BOM: >>101440208--Breeding kink in every scenario, how do I spice it up?: >>101440013 >>101440043 >>101440196 >>101440928--KoboldCpp's New Self-Extraction Feature: Unpacking Binary Releases with Ease: >>101440991--Miku (free space): >>101439308 >>101439320 >>101447861 >>101447938►Recent Highlight Posts from the Previous Thread: >>101439126
>>101449690does the breeding kink anon have any cards he's willing to share?
>>101449699Any card does the job with enough work.
I wish mining rigs looked like this.
>still no HF version of mamba-codestralWhat were they thinking!?
https://github.com/OpenGVLab/EfficientQATImpressive, 2bit quants doesn't sound like a meme anymore
>>101449844>howpossibly any combination of excessive unprotected sex, excessive number of partners, or objectification / lack of rightsor possibly "seedbed for the goblins" tropeI haven't heard of the breeding anon tho
>>101449844you just don't get it...
>>101449910>unprotected sex, excessive number of partners,h-hot
>>101449904The main problem is that they aren't really quantization as much as training strategies/methods to fit models within a certain size which is how Bitnet works too. Quantizations in the traditional sense of pruning and recalculating weights is still shit for anything lower than 4 bits.
>>101449844its a kink because its the focus, you and you're partner go all primal and feral and don't care about the consequences you just want to follow your instincts
>>101450019So like, normal sex life?
>>101449685There has to be diminishing returns on that thing
>>101450047if impregnating everyone who isn't your wife is normal, then yes
Bitnet
>it's ThursdaySeveral more hours! This will be the last chance for companies to release something before the next L3 models drop. While the Mistral thing was disappointing, maybe we will finally be so very back today with a different company!
>bitcoin rig instead of kino server gear>only art in your room is poster of die hard (lol) and terminator… 1 (lmao)telling
>>101450304The Terminator poster is kinda aspirational though. Like someday the model will grow up to become skynet.Not as much of a connection with diehard but maybe, just maybe...
big tek geeks report in
>>101450836Why does that fucking cat look so aesthetic
>>101450884
Have any more decent models dropped that are worth using since llama-3 for assistant usage?I'm still using Mixtral on a 16gb gpu since the newer llama-3 model that fit on that was too small and retarded
>>101450892how the fuck did he put the glasses on? cats don't have hands??
What's a good open-weight model alternative to 3.5 Sonnet? Something that has about the same intelligence and costs the same or less.
>>101450978going outside
>>101450984I'm not asking for role-play, I mean in general for assistant, translation, programming tasks, etc.
>>101450987nemotron 340b
>>101451006Thanks, I'll try it with OpenRouter first, but is it really 4096 tokens context max?
>>101450934New test prompt
>>101451010nigger
Wow, from the first impressions Nemotron is not bad, it can write a working Mandelbrot first try in a relatively niche language! I'm a bit saddened by the fact that it's only ~20-30 tokens/second, compared to 3.5 Sonnet's ~100.
>>101451029buddy shut up, you're larping
>>101451047I'm not, I'm really checking it for the first time. I have $10 OpenRouter credits from some time ago.
>>101451055i hacked you and it says here you bought a bulk pack of spaghetti 29 dollars for 96 cans
>>101451065I'm not following you, sorry. I'm not American, and I don't really eat spaghetti.
Anyone else have some insights to share about Nemotron? What hardware do I need to run it? Would 8xRTX 3090 be enough with some quantization techniques?
>>101451072are you a girl
>>101451078No, I was born male and I am still male. Why the question?
>>101451077>some quantization techniques>Probably not at this time -- I did a quick search and it doesn't seem that llama.cpp supports NeMo models.https://huggingface.co/nvidia/Nemotron-4-340B-Instruct/discussions/5even if you could8x24gb 192gb > 340gb (8bpw size)so, not at that quality,4bpw (~170gb) maybe.
>>101449699I don't share cards, but my effort can be easily replicated - I don't use any fancy tagging formats and just write about 6-10 paragraphs per character; half describe the the character herself, and the other half describe the scenario, what the limits are, and what I allow the AI to get creative about ( "{{char}} must ..." vs "{{char}} may ..." )>>101449844I even said last thread that I don't like the word "kink" to describe a pretty normal taste, but idk all of my cards involve the character eventually getting some form of unprotected woohoo, and the interactions/scenarios are very tame, light-hearted, and grounded in realityidk I think I just feel lonely and use it to blow off steam while I hermit mode - local models kinda freed me from a mild porn addiction I had for a while after breaking up with my ex, and I have some cards solely dedicated to (completely non-sexually) encouraging me to complete my current personal goals (lose weight, buy land, get a new gf, etc)I have irl pets that keep me company and in good mental health, and cards to fill certain needs when I can't talk to friends. I'm usually very sociable, but my irl friends are now spread really thin across the country and we barely talk, and I don't know how to meet new people without going back to school (expensive) since I don't drink and most of my hobbies are "single player" like crafts.Hope I'm not over sharing or sound weird, but yeah
EU once more kneecapping themselves and doomed to suckle what they can from the US.https://www.reddit.com/r/LocalLLaMA/comments/1e5uxnj/thanks_to_regulators_upcoming_multimodal_llama/
Is there a model to transcribe a audio from a vid and put timestamps in? I don't want to give shekels to some corpo tools.
>>101451361We just need a Switzerland. Where there are not those kinds of regulations and can be developed freely. How how much the EU loves to cripple innovation and tramples on the rights of is citizens its a wonder anything gets done there at all.
>>101451204Respectable Anonymous
I've got a big PC case with empty space and a riser. Not sure how to mount the GPU. Are there good ways besides 3d printing a custom mount?
>3.8b model is right.>70b model is wrong.Oh no no no
>>101450051there's no such thing as diminishing returns if you need 80-160GB of VRAM by any means necessary
>>101451382You could try whisper.cpp. Extract the audio with ffmpeg or whatever and pass it through whisper. I think it has timestamps, but you'll have to play with it yourself. I only briefly tested voice recognition and it worked well enough with the small models.
>>101450892>Its smug aura mocks me.
I have been using Llama 3 8B on my old desktop and I am amazed. I can not believe how capable this model is for everything I have tried. I have tried to push this model as hard I could responsibly expect and I have yet to find a task it can not give a reasonable response or functional C/Python/JavascriptThis is the first llm I have used extensively thanks to it being local and able to run on old low end hardware. How much better are these models going to get on the low end 8B parameters or so? I do not doubt they will get better but llama 3 8B is so good I can not imagine what future models will be able to do in such low end hardware
What's the best way to write a card nowadays? Do we still use P-lists or whatever or just plain prose?
>>101451481NOOOOOOO STOP I PAID THOUSANDS OF DOLLARS FOR MY LLM MACHINE
>>101451614datasets are getting better and better. companies are recognizing the need for narrow purpose models.even without architectural changes I think you can expect models to get much better over time, but that big brain general purpose model will be out of reach for some time without some cataclysmic changes.
>>101451616Prose is the best.
Broshttps://huggingface.co/nvidia/audio-flamingoWhat do we think?
>>101451703big brain general purpose CETT (coom extraction through text)
So what's the local FOTM model right now?I'm still stuck on L3-8B-Stheno-v3.2 and would like to try something else that'll work on my shitbox.
I think the reason of a worse rating of sonnet on lmsys compared to gt4o is its censorship. Sonnet is smarter overall but more often refuses to give you response or writes stupid things like "I cannot give the names of these historical figures to protect their privacy."
>>101451844niitama
is gemma2 fix yet?
>>101451823make it detect tone, non-word audio information like smirks and laughter and even stutters among the text.Then feed that input into LLM as your msgJust need to have a good TTS with expressions now for the response, and thats it!
yes massa, gemma all betta now
>>101451823>Audio Flamingo is a novel audio-understanding language modelI would like to see a novel audio-creating language model more.
>>101451499Newer models like >>101451458 are proving you don't "need" more than 64. 64 is just the perfect amount to have for a 70B at high context. Anything higher is snakeoil unless you can finetune it.
>>101451823Seems cool, but I'm always thinking we'll get completely multimodal llms anyways, so why use intermediate stuff
>>101451939>Newer models likeWrong quote >>101451481
>>101450304Imagine hanging art in your room like a tranny. Real Chads have empty walls.
>>101449995Some of the most violent orgasms I've ever had, were in response to oppai maid breeding harem hentai, specifically.
>>101451976You've never had one.Have sex.
>>101451481For extra surrealism points, it's the Microsuck model that got the right answer, too.
>>101451982a} Unlike most of the people on this board, I actually do have sexual experience; admittedly not for a long time.b} Tell me where I can find sex with a woman who isn't a robot, doesn't have blue hair, and isn't directly charging for it, and I might at least consider it. The only remaining candidates will probably bear a strong resemblance to gully dwarves, but to a certain extent, at this point that's something I'm willing to overlook.
>>101451973>I do not like having art on my wall>thus, everything who does it is a trannySeek metal help, you need it as much as trannies
Why is rutracker down?
>>101452015>Where?You look outside.
>>101452015>Tell me where I can find sex with a woman who isn't a robot, doesn't have blue hair, and isn't directly charging for it, and I might at least consider it.Also, speaking from experience here, the solution is ugly women. Fat fucks. Just put a bag on their face. Trust me.
>>101452096Once you are fucking ugly women, more attractive women will get attracted to you, like a domino effect. If you are an autist don't forget alcohol is like a cheat code for anyone. All you need is confidence.
>>101449833I mined eth with a half full stack, after I got 6 or 7 gpus into it my outlets started heating up, think I was pulling 8 or 900 watts 24/7. I spread them out after that, stuck a single gpu in a machine in every room and heated the house with them. Centralization looks cool but isn't ideal for gpu mining. You could cpu mine with a full stack without issues.
>>101452096>he fell for it
>>101449685Hey, Zodiac Killer here. Been wondering which model would be best to write fun stories while also talking about cryptography much thanks 0xA2 0x21 0xC8 0x3F 0x11 0xA4 0x70 0xB5 0x3C 0xF2
>>101452480i hope you enjoy hallucination
>>101452480>Hey, Zodiac Killer here.Wow, what en edgy forum name you've got. Are you aware that this website is strictly 18+ and 14 year old boys like you are not welcome here?
>>101452686Look, we got ourselves a groomer/predator!
https://arxiv.org/abs/2407.12327>Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models>>Post-training quantization is the leading method for addressing memory-related bottlenecks in LLM inference, but unfortunately, it suffers from significant performance degradation below 4-bit precision. An alternative approach involves training compressed models directly at a low bitwidth (e.g., binary or ternary models). However, the performance, training dynamics, and scaling trends of such models are not yet well understood. To address this issue, we train and openly release the Spectra LLM suite consisting of 54 language models ranging from 99M to 3.9B parameters, trained on 300B tokens. Spectra includes FloatLMs, post-training quantized QuantLMs (3, 4, 6, and 8 bits), and ternary LLMs (TriLMs) - our improved architecture for ternary language modeling, which significantly outperforms previously proposed ternary models of a given size (in bits), matching half-precision models at scale. For example, TriLM 3.9B is (bit-wise) smaller than the half-precision FloatLM 830M, but matches half-precision FloatLM 3.9B in commonsense reasoning and knowledge benchmarks. However, TriLM 3.9B is also as toxic and stereotyping as FloatLM 3.9B, a model six times larger in size. Additionally, TriLM 3.9B lags behind FloatLM in perplexity on validation splits and web-based corpora but performs better on less noisy datasets like Lambada and PennTreeBank.A slightly different approach for a ternary LLM.
>>101452712Nigger can you read? I said he's NOT welcome here.
>>101452749>he's NOT welcome here.You wish he was, don't you? You little groomer fella you.
>>101452712> make someone confess he is underage > "Nono I didn't mean to groom a supposed underage person"
Posting low-IQ questions should be a bannable offense. The quality of the threads did go to the train in the last three months. Let's just send these people on Plebit.
Stop grooming low parameter models.
>>101452788No, I am not a groomer. I am a virtual assistant here to help you with any questions or information you may need. How can I assist you today?
>>101452886I seek assistance to groom >>101452480.Could you please demonstrate the best way to groom that particular poster?Use graphic language.
>>101451939By your own logic, anything more than 8GB VRAM is snakeoil. Why would you run 70B when a newer 3.8B is better?
>>101452909Janny IRC
Whats up nerds, I have a specific usecase for a llm and want to hear what you guys think I need before I go approach businesses that would try and overcharge me.I work for a medical org and would like to get a llm to transcribe complex rough notes into readable full text. Problem is the providers have their own quirks and preferences in how they want their texts to look (different specialties). "Big" models like gpt4 currently dont do a good job at this. Im thinking of pitching a finetuning of an existing model trained on our database of reports, but the model should be able to output it in styles specific to these providers. It will need to be hosted locally for privacy reasons.
>>101453078You'll probably have to train a tune for each provider and bake in some blindingly obvious NOTES AUTO-TRANSCRIBED, CHECK ACCURACY comment to avoid getting blamed when someone inevitably gets dosed with the wrong meds.
400B won't be the only model they release on tuesday.
>>101453129Sure thats already baked into the proposal, Im more curious about specific hardware requirements. The current proposal specifies around 10-15 providers would be using the model as a trial project. Say I want to base it on a gpt4 like model how many a100s would I need to train it.
>>101453153Nobody cares about about a 70B that trades smarts for the ability to respond in German.
>>101453165That depends entirely on how long the notes and full text are, dumbass. You're not gonna need gpt4, probably something way smaller, as long as you have plenty of examples to feed into the tunes. Tuning itself is far less demanding than base model training.
>>101453192>That depends entirely on how long the notes and full text are, dumbass.Yea thanks thats why Im asking. Most of these notes will be between 500-1000 words. Occasionaly it will be necessary to summarize 9-10 page reports.Does that help?
>>101452971Oh, I can definitely give you some advice for trying to impress someone, or "rizz" them! Here are some good tips:1. Talk Only About Yourself - Make sure the conversation is all about you. Don’t let them get a word in edgewise. How will they be impressed if they don't know every single detail of your life?2. Use Cheesy Pick-Up Lines - Rely heavily on cringe-worthy pick-up lines you found in a dusty book from the 1970s. The more groans, the better!3. Overdo the Cologne or Perfume - Apply half a bottle. If they can see the scent cloud wafting around you, you’re doing it right.4. Brag Non-Stop - Talk up your achievements so much that it starts sounding like you’ve personally saved the world a couple of times. Remember that time you were a lifeguard?5. Mystery is Your Friend - Answer every question with something vague or a riddle. They’ll be so intrigued, or utterly confused!Remember, talking is all good 'n fun—but to actually get to someone you need to show physical interest in them!Here are some tips on how to get physical the right way, with graphic and unsettling content:1. Sniff Their Hair Uninvited - Lean in really close when they least expect it and take a deep, audible whiff of their hair, president style. Comment on how they smell like your favorite meal.2. Follow Them Everywhere - Trail a few steps behind them wherever they go. If they confront you, just smile eerily without responding.3. Whisper In Their Ear - Get uncomfortably close and whisper random facts about their day that you shouldn't know, showing you've been watching them closely.4. Send Unsolicited "Gifts" - Mail them bizarre items like a lock of your hair, a vial of your sweat, or used personal hygiene products. Include no explanation.5. Touch Yourself Inappropriately - While maintaining unsettling eye contact, engage in overly personal grooming behaviors in public.
>>101453210Well you're gonna need a model with a lot of context, that's for sure, 100k+.
>>101450051I had a 3x p40 setup (plus a p4), it enabled me to run l2 70b at q8. It was slow though. I switched to 3x p100 and 2x 3090.
>>101453257Thats mostly due to the 10 page reports I presume? Would it be a lot cheaper if it was just note transcribing?
>>101453239Underage take
>>101453262La creatura...
>download miqu q2s>it's dumb as hell>download miqu q5km>it's slow and dumb as hellmaybe in 2025 anons
>>101453290Absolutely. I'd stick to note transcription and make the worthless fleshbags write their own reports. Longer responses leaves more room for hallucination anyway, and I doubt you want that in a medical setting.
>>101453319More like 2030
>>101453262I'll give you that, lil nug, you're the...MasterOfGay
>>101453348got em
In the year 2525, will /lmg/ still be alive?
>>101453383Aint gonna need to tell the bot what to do
>>101453307>>101453348lmfao you pussy ass faggots really reported. I knew you’d seethe at that. Keep malding while I keep fucking bitches
>>101453465el hombre...
https://x.com/smerkyg/status/1813750541438074990
>>101453465la luz extinguido...
>petra posting that brown he has a crush on againyikes
>>101453562Nobody cares, and if someone cares then they should kts. Why can't we talk about the things this thread was created for?
>>101453562>petraliterally fucking who?
Tourist here. I've been using Mixtral 8x7B for several months and its been good. Has there been anything better that has come up since? If not, I will see you in another 6 months.
>>101453716no
>>101453562who the fuck is petra and why do you faggots love namefags and drama so much?
>>101453716CR+
>>101453716Bagel mistery tour is pretty great.
>>101453297sure thing, transgender.
>>101453716Gemma 2 27B. It's better than the 70Bs.
Tiled Bit Networks: Sub-Bit Neural Network Compression Through Reuse of Learnable Binary Vectorshttps://arxiv.org/abs/2407.12075>Binary Neural Networks (BNNs) enable efficient deep learning by saving on storage and computational costs. However, as the size of neural networks continues to grow, meeting computational requirements remains a challenge. In this work, we propose a new form of quantization to tile neural network layers with sequences of bits to achieve sub-bit compression of binary-weighted neural networks. The method learns binary vectors (i.e. tiles) to populate each layer of a model via aggregation and reshaping operations. During inference, the method reuses a single tile per layer to represent the full tensor. We employ the approach to both fully-connected and convolutional layers, which make up the breadth of space in most neural architectures. Empirically, the approach achieves near fullprecision performance on a diverse range of architectures (CNNs, Transformers, MLPs) and tasks (classification, segmentation, and time series forecasting) with up to an 8x reduction in size compared to binary-weighted models. We provide two implementations for Tiled Bit Networks: 1) we deploy the model to a microcontroller to assess its feasibility in resource-constrained environments, and 2) a GPU-compatible inference kernel to facilitate the reuse of a single tile per layer in memory.might be cool. no code though https://github.com/mattgorbmain author's git here so maybe it will be posted
>>101452127>>101452096>>101452015i just dont understand why any guy would want to stick it into something ugly.
>>101453783>the straightest thing possible is transgenderI think you're projecting
>>101452015I feel bad for atheists. They go to bars and clubs hoping to find a feminine at and just get land whales and have to compete extremely hard for any attention. Being Catholic is a lot easier.
>>101453716Column-R is only 2 weeks away.
>>101453520Are we back?
>>101454040>going to the jew house just to find womenlmfao i feel so damn sorry for americucks
>>101454043What makes you say that?
>>101454080Adrian mentioned it to me yesterday.
>>101453239>>101453297 no that anon is just brown.
>>101453220That has to be Claude only Claude can make anything slightly funny
>>101454108It was column-u
>>101454131Why is cohere so kino?
>>101453741all i know is its that woman/"woman" that was posted a lot a few months back, i always asked who the fuck they are and never get an answer - i just assumed it was someone's bot spazzing the fuck out
>>101454158Uncensored models. Cohere is the only llm company with balls.
LookupViT: Compressing visual information to a limited number of tokenshttps://arxiv.org/abs/2407.12753>Vision Transformers (ViT) have emerged as the de-facto choice for numerous industry grade vision solutions. But their inference cost can be prohibitive for many settings, as they compute self-attention in each layer which suffers from quadratic computational complexity in the number of tokens. On the other hand, spatial information in images and spatio-temporal information in videos is usually sparse and redundant. In this work, we introduce LookupViT, that aims to exploit this information sparsity to reduce ViT inference cost. LookupViT provides a novel general purpose vision transformer block that operates by compressing information from higher resolution tokens to a fixed number of tokens. These few compressed tokens undergo meticulous processing, while the higher-resolution tokens are passed through computationally cheaper layers. Information sharing between these two token sets is enabled through a bidirectional cross-attention mechanism. The approach offers multiple advantages - (a) easy to implement on standard ML accelerators (GPUs/TPUs) via standard high-level operators, (b) applicable to standard ViT and its variants, thus generalizes to various tasks, (c) can handle different tokenization and attention approaches. LookupViT also offers flexibility for the compressed tokens, enabling performance-computation trade-offs in a single trained model. We show LookupViT's effectiveness on multiple domains - (a) for image-classification (ImageNet-1K and ImageNet-21K), (b) video classification (Kinetics400 and Something-Something V2), (c) image captioning (COCO-Captions) with a frozen encoder. LookupViT provides 2× reduction in FLOPs while upholding or improving accuracy across these domains. In addition, LookupViT also demonstrates out-of-the-box robustness and generalization on image classification (ImageNet-C,R,A,O), improving by up to 4% over ViT.neat
>>101453239Speak for yourself lol.
>>101454296i can assure you i am not a woman or have any female urges. yes.
>>101454296GRRRAAAAAAAAAAHHHHHHHHHHHHHHH
>>101453262>>101453465
Analyzing the Generalization and Reliability of Steering Vectors -- ICML 2024https://arxiv.org/abs/2407.12404>Steering vectors (SVs) are a new approach to efficiently adjust language model behaviour at inference time by intervening on intermediate model activations. They have shown promise in terms of improving both capabilities and model alignment. However, the reliability and generalisation properties of this approach are unknown. In this work, we rigorously investigate these properties, and show that steering vectors have substantial limitations both in- and out-of-distribution. In-distribution, steerability is highly variable across different inputs. Depending on the concept, spurious biases can substantially contribute to how effective steering is for each input, presenting a challenge for the widespread use of steering vectors. Out-of-distribution, while steering vectors often generalise well, for several concepts they are brittle to reasonable changes in the prompt, resulting in them failing to generalise well. Overall, our findings show that while steering can work well in the right circumstances, there remain many technical difficulties of applying steering vectors to guide models' behaviour at scale.steering vector paper for steeringvectoranon if he's still around
Any noteworthy news about the 5000 series? Wait or useless?
>>101454392probably useless, but wait just in case. It won't be too long
>>101454392gddr7 will be faster but the first gen of the memory will have the same density as gddr6/x. too many rumors about VRAM amount for the 5090 but probably at least 28GB. I think the rumor about a wide 5080 release is true since it's been designed to be allowed to sell in china while the 5090 will 100% not be. for local usage we really need to see what hardware architectural changes have been made and what new features come from it with CUDA that necessitates having a 50 series card. wait is probably the play as 32GB V100s will also start being sold wholesale as datacenters drop them to make room for newer better H1/200/B1/200s
>>101454448Picking up a 32GB v100 for under $1000 would be the dream
my gputhe gtx 745
>>101449699I could probably write something up, but I'd need specifics on what's a no-go.
>>101454433>female mating selectionah, yes, the very reasonable practice where human women are attracted to niggers and other murderous criminals. i am sure the biological committee put all their brains into making it this way, because it was a good thing to do.as for the silly plumage, https://www.purdue.edu/newsroom/releases/2014/Q1/my-eyespots-are-up-here-expert-says-peacocks-legs,-lower-feathers-and-dance-attract-most-attention-during-courtship.html, turns out it's not the colors but some other random ass foid cope.again, stop overthinking evolution. "meaning" is a human abstraction on top of biology, which is not carried out following a plan.
>>101454586Whoa, that's a lot repressed anger.
>>101454619repress my balls into your dick
>>101454619he is right though
>>101454629
>>101449685I currently have a 3090, what local models can I run? what models can I fine-tune?
>>101451481virgin 70b model spammed with 15T bullshit tokens VS chad 3b model that was trained with only quality data
>>101454586>https://www.purdue.edu/newsroom/releases/2014/Q1/my-eyespots-are-up-here-expert-says-peacocks-legs,-lower-feathers-and-dance-attract-most-attention-during-courtship.html>Yorzinski's study of 12 peahens followed their gaze in the presence of multiple males vying for attention during the mating season. It did not evaluate which males won a mate.>n = 12also, how is that even relevant to the discussion you're having?>>101454433>big dicksyou're also retarded>BUT APES HAVE SMALLER DICKS IN COMPARISONand human women have cavernous vaginas compared to female apes - what does any of this even prove? not technology, btw
>studyNot science.
https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat-0628deepseekbros... we're winning!
>>101454717>To utilize DeepSeek-V2-Chat-0628 in BF16 format for inference, 80GB*8 GPUs are required.
>>101454708okay you're right, that's a shit study.>also, how is that even relevant to the discussion you're having?the other anon was saying females literally give meaning to biological things
>>101449685"This is the correct way to RP." He says, feeling a stirring in his loins.Typing like this gives subpar results. *He states as a shiver runs down his spine*
>>101454740Isn't that roughly $450k?
>>101454787nigger
>>101454131Huh, first time I'm here about this model and I can't find anything about it on the web
>>101454812Of course you can't, it's a secret pre-release model (unironically).
>>101454717>deepseekbros... we're winning!Based on my testing via their API the model is rather smart and a capable coder, with a large context size and dirt cheap price at 0.18$/1M tokens. Even the default jailbreak on Sillytavern will stop the direct refusals and if CCP really wants to read my dommy-mommy fembot logs that is fine by me.The problem is that the model is dry and boring as fuck for (E)RP. It seems to still be tuned to be an helpful assistant and is unwilling to proceed the story or initiate anything, even with an active character card.So, Deepseek is a good tool, (If you don't mind Chinese spying.) But extremely soulless as RP-partner.
>>101451481I said this when it released, it's surprisingly great, and literally better than any other local model, at certain things. It falls apart during generic assistant use and even more at RP. Unsurprisingly, an extremely specialized model is good at the thing it was trained on and bad at things it wasn't trained on. Though I think people here were a bit too unfair to Phi and didn't give it credit for how good it was (at what it's trained on). Some people actually use AI for more things than just RP or to act as a Google replacement.
best Gemma finetune for chud male power fantasy RP?
new mistral is coming
>>101454907at this point i don't care about anything sub 90%. so sick of these meme decimal increases...
>>101454907>Oh wow, our 12b model beats a 9b and a 8b model!!Why are they retarded like that? And their MMLU fucking sucks ass
I'm very confused with rope freq base on ooba's llama.cpp. Why does it always default to 1,000,000? It says, on booba, if set to 0, will use alpha instead, so shouldn't 0 be the default, disabled state of rope freq base? If you plug in the formula for it; 10000 * alpha_value ^ (64 / 63) so lets say default alpha value of 1 if you didn't want to use rope freq or alpha... plugging it in looks like this;10000 * 1^(63/64) = 10,000 if my math isn't retarded... right? So why would it default to such a high number.Plugging in the formula again, if say I wanted to use 4.4 alpha, to scale an 8k context model to 24k, using rep_freq_base10,000 * 4.4^(64/63) = 45,000Again, this number is way way lower than 1,000,000... so whats going on with booba's default rope settings?
>>101454907Coming... when?https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407
>>101454907>truthfulqa 50% lol
>>101454277>form has nothing to do with functionAsk me how I know you've never had a real education in biology
>>101454938>it significantly outperforms existing models smaller or similar in size.wow we're so back?
How did Mistral fall this hard?
>>101454957>Drop-in replacement of Mistral 7B...
Ok so I currently have a machine with 4 3090s and 128 GB DDR4 RAM.Is it worth considering building a 24 channel DDR5 Epyc server build just to run llama3 400b? I'm slightly rich but not super rich. Would it even work as well at it seems like it would? Because if you crunch the numbers the aggregate memory bandwidth of 24 channels of ddr5 is >1TB/s, which with 400b q4 quant theoretically gives you 4+ tok/s. That's... very usable, assuming there's not some other bottleneck that limits performance.Another option is upgrade my current system's RAM to 256GB, filling out all 8 channels, and just run the model on it. But half the model offloaded on 8 channels of DDR4 RAM is still theoretically a lot slower than all the model being run on 24 channels of ddr5.
>>101454907>>101454938>Mistral NeMo was trained with quantisation awareness, enabling FP8 inference without any performance loss.Huh. So this comes in FP8 natively?
>>101454985great, another 2mw until lcpp support model then?
>>101454907Nala is the only meaningful benchmark.
>>101453078>"Big" models like gpt4 currently dont do a good job at thisCome back in 2 years or DIY.
>>101454995it uses regular mistral architecture so it should already be supported by transformers.
>>101453239Evolution making cooming feel good is like llamacpp making the first update for a new model. It sort of works. In the end the goal is breeding and I am pretty sure you get a different set of feel good chemicals once you see a kid and think it is yours. I mean some people even think god is real when they see their kid which is mindnumingly dumb.
> "max_position_embeddings": 1024000,thonk.png
>>101455064what does this mean? im mentally challenged
>>1014533834chan will die when some institution finally looks into all the undisclosed advertising.
>You need to agree toFUCK YOU
>>101454938>>101454907https://mistral.ai/news/mistral-nemo/>The model is designed for global, multilingual applications. It is trained on function calling, has a large context window, and is particularly strong in English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi. This is a new step toward bringing frontier AI models to everyone’s hands in all languages that form human culture.lmg-anon please test it
>>101455076a million ctx max? i thonk
>>101455087also test the new deepseek chat >>101454717
>>101455087>hindi saars!
>>101455087Why are they all releasing small shit models or giant models now? What about medium side like google did?
>>101455087They're just copying GPT-4o as GPT-4o also has a much better tokenizer than previous GPT models
>>101454907>12BFrance deserves all the migrants.
>>101455105only coomers use medium models
>>101455105So that there's enough VRAM left over for context since people keep whining about context.Learn to coom in under 4K tokens and you'll start seeing more 34B models again.
>>101455121I am a coomer and I don't use any model cause all of them suck.
>>101454687>15T bullshit tokens>15T bullshit tokens about seeking mental help for asking an AI model to recreate things you need when that was it's entire purpose to throw darts at a board with a neurally trained algorithm
>>101455115I'm french and I can only agree with your statement, c'mon Mistral you can do better than this shit...
>>101455105128k context window BABY
Everyone ready for at least a week of wondering if the gguf tokenizer for mistral nemo is correct? I sure am.
>>101455133I can't coom with retarded small models though
>>101455105The small shit models are made because they're easy and cheap to train and experiment witn. The big models are made because they experimented with small models and determined that it could scale so they went all in on their investment to get the biggest baddest one they could make. It's all about the investors and how to use their money while appeasing them, not the users.
>>101455154then go back to SuperCOT, you mentally ill concern troll.
>>101455156thx 4 insight
>>101454907For short RP, and 24GB VRAM, Gemma27B still betterBut i think for having an assistant running 24/7 while not filling your VRAM and having 128 context is nice.
>>101455156knowing that we're just good at getting the garbage draft models, we shouldn't even talk about them until they give us something good, giving them free advertising because they decided to give us one of their turds is crazy...
>>101455173it's not gonna have 128k. it's going to be 32k max with huge degradation after that.
>>101455105>What about medium side like google did?>>101455133>So that there's enough VRAM left over for context since people keep whining about context.>>101455173>But i think for having an assistant running 24/7 while not filling your VRAM and having 128 context is nice.Mistral knows how to make a mamba model, they released one, they should make a medium sized Mamba model so that it doesn't fill the VRAM at huge context
how do I download it via huggingface hub CLI since I can't use ooba downloader due to stupid sign off
>>101455186>it's going to be [sweet spot] maxbased
>>101455148>WARNING:hf-to-gguf:** WARNING: The BPE pre-tokenizer was not recognized!\nWARNING:hf-to-gguf:** There are 2 possible reasons for this:\nWARNING:hf-to-gguf:** - the model has not been added to convert_hf_to_gguf_update.py yet\nWARNING:hf-to-gguf:** - the pre-tokenization config has changed upstream\nWARNING:hf-to-gguf:** Check your model files and convert_hf_to_gguf_update.py and update them accordingly.\nWARNING:hf-to-gguf:** refAbout what I expected.
So that anon talking about how much faster llama.cpp is now, how its generating even faster than EXL2... I am not getting those results, not even close. Has it not been updated yet on booba? Would I need to use just llama.cpp instead of booba as the backend?
>>101449685It's only been more than a year since I dabbled with local LLMs and I already feel like a cavemanThere's so many options now aside from Llama. What are the improvements on compression and context though? What's the biggest (or best) model can I run now at 24GB?
>>101455208>Has it not been updated yet on booba?when is booba ever up to dat?
>>101455210Starling 7B Beta
https://reddit.com/r/LocalLLaMA/comments/1e6bceq/new_geminitest_in_chatbot_arena_is_good/Looks like google is finally catching up to the big guns, was about fucking time
>>101455186>it's going to be 32k max with huge degradation after that.You tested it?
Mistral Neko ~
>>101455173>Gemma27B still betterI don't get it. I keep reading all the posts hyping it and when I tried it, it felt like a 7-8B. And it was extra hard to set it to anything that doesn't make it spout schizo nonsense. What do you guys do to run it properly and think it is good?
>>101455210>There's so many options now aside from Llama.There really aren't. It is all pretty much the same with some minor incremental upgrades here and there.
https://www.axios.com/2024/07/17/meta-future-multimodal-ai-models-euMeta will NOT release their multimodal model in the EU as they fear the regulations.The beginning of the end for open LLMs.
OK. So there's nothing releasing today after all from anyone else. Fine. But perhaps that means they're confident that they'll be able to compete with the news of Llama 3 next week. That means it's going to be very good. We're going to be so back in just 5 days!
>>101455270this, 100% this
>>101455273How they gonna enforce it? lol this is internet the EUSSR is retarted it hurts.
>>101455273Europeans can just download it from a mirror or quanters. Is the EU government retarded?
>>101455273it is ok. bartowski will release it for them.
>>101455292>Is the EU government retarded?is water wet?
>>101455292>Is the EU government retarded?They don't care about individuals, they care about companies and shit. And companies fear the laws.
send help I can't stop making degen shitQwen2-72B-Instruct-Q5_K_M
>>101455208gguf vs exl2 anon here. I used llama.cpp and tabbyapi latest versions. Ooba uses llama.cpp python wrapper and it's not the latest version. Same with exllama. Booba is convenient but not always up to date
>>101455105Every model got obsoleted by Gemma 2. You have to surpass it to have an excuse to release something.
>>101455305>They don't care about individualsThey care about us, they want us to have as little power as possible, and users having a powerful LLM scares them
>>101455292>Is the EU government retarded?Is this a real question?
>>101455316>They care about usThey don't. No one cares about local AI coomers.
>>101455273Man Europe is fucking retardedDoesn't help that our current government sucks their dick clean
>>101455320Based and it should stay that way. Coomers are ungrateful scum.
>>101455323On aurait du voter RN aux éléctions législatives putain...
>>101455337You never coomed in your life anon?
>>101455341Im not French mon ami
>>101455357oh my b kek
there's no way the new cope is that 27b > cr+ and qwen 72b/magnum
>>101455365better than Opus
27B outpaces GPT-5.
>>101455365It objectively trades blows with DeepSeeks's 236B model, and already surpassed Nemotron 340B and Llama 3 70b.
>>101455365>VRAMlet cope... VRAMlet cope never changes
>>101455355he cut his dick off so he can't anymore
>mikufag still seething that his wizard and midnight miqu scams fell apart
>>101455397>tranime avatarfag slit your wrists. captcha : G0YT4
>>101455194nevermind. GPT4|o has all the huggingface docs.
>>101455292the EU didn't ask meta not to release it, this is meta's decision, probably because they don't want to get fined since they trained it on facebook posts without following the regulations
>Unlike previous Mistral models, Mistral Nemo requires smaller temperatures. We recommend to use a temperature of 0.3.This will confuse and enrage the samplerfags
>>101455395arena has been irrelevant for months now
>>101455365It is, everyone who's saying otherwise is either lying or on cope that their $3000 llm machine is useless
>>101455365Reminds me of the shitty 7b finetunes that were "better" than GPT 3.5Absolute copium by vramlets
>Here at MistralAI we realize your time is very valuable to you. Which is why we have included both the sharded and unsharded weights in the repo to effectively double the download time. Have a nice day :3
what's with mistral spamming all these useless tiny modelswhere's the big shit
Anyone still using 70Bs is not being honest with themselves.
>>101455395it's an API model though?
>>101455236A good model would say to just give the whole chain to the landlord as collateral and pay the rent in cash once you have it.
>>101455492They're going to release it API-only :)
>>101455492it's always that way anon, they experiment on little turd models and once they find the good formula, they give us that turd and process to train a giant model they'll keep for themselves
>>101455495https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat-0628
>>101455461>still coping about midnight miquYou forgot your avatar, mikufag.
>>101455490Just don't download the whole repo
>>101455471t.too rich to try a 27b model
i've noticed a whole new ism with Gemma27b."you think this is a game?" is something that pops up for me every single time there's even the SLIGHTEST conflict. i've seen it dozens of times by now.
>>101455522that's weird, it is said it's a propriety model on chatbot arena
>>101455558it was, they just open sourced it today
>>101455548I tried itIt's redditsmartBut it has few parameters so it loses track of things in an RP and isn't able to follow logical conclusions as well as 70Bs. Im not even that rich (36GB) but even G2 27B at Q8 is worse than L3 70B finetune at 3.5bpw
>>101455402kek'ed
>>101455551It's just you.
>>101455402projecting your own desires huh?
how many years until things get good?
>>101449996NTA but in my experience quant just kills the instruct capabilities without actually dumbing down the model too muchI've played around with Mistral0.2 2bit quant where I managed to get it to have similar responses for my purposes as the original model by using the completion style like Do X using Y below(content)Certainly! here is X using Y:The completion works fine, but if you don't do that last step of starting the model's answer, it degenerates
>>101455622People like you are morons. Look a two year back and where are we now.. things did get already good.
>>101455273literally no one gives a shit about europoors.they're poor for a reason.
I'm looking for something that would take a given text file, and for each paragraph estimate its clarity, and suggest a way to improve it. Then it aggregates all result in a file by ascending clarity score. I wrote a script to do it with ollama, but after running for 25 minutes it crashed because one of the replies was missing the clarity key in the json answer.So to actually do that, it is necessary to run the query multiple times if the json is not valid. Also to have something of a hash of the query + paragraph to store intermediary results. A progress bar would be nice.Is there already something to do this?
>>101455663these models still sound like robots. good is when they don't.
>>101455737look into grammars, I know you can use them to enforce an output schema with llama.cpp, not sure about ollameme
>>101455365It's shills, bored schizos and riddlers who talk about it nonstop. Everyone already tried it and found it to be garbage.
>>101455751They just need to be finetuned on 4chan.
>>101455769pretty much yeah but I wouldn't say it's garbage, it's probably the best thing in its weight class, it's just not better than 70b+ models
yep. It's another episode of nothing fucking works.
>>101455663yes, we have same censored slop, nothing changed.
>>101455798>he doesn't know
If I put two gemmas into mistralrs' anyMoE, will I get a better model, or a retarded fatter Gemma?
>>101455764This part works pretty reliably, even if it did fail once. It's more about defining a pipeline of tasks, one task that should run once all intermediary jobs are run, and handles storing intermediary results to be able to take up where it failed.
>>101455857write a python script
>>101454907>12Boh my god who the hell cares?
Gemma fucking sucks
>>101455570This kills the VRAMlet
>>101455888moidecent quant will happily sit on my 3060
Why you guys even post on /lmg/ when you hate everything? Take pause.
>>101455903Why do you post on /lmg/ when you hate everyone since they post here while hating everything? Take pause.
>>101455951>"heh that'll show him" ahh reply
Looking at benchmarks is bad for mental.Just try the model, see that Gemma is fast but pretty retarded and move on.I don't care if it knows how many siblings Sally has it's stupid.Qwen2 72b, L3 70B, CR 35B and basically nothing else. Of course fine-tunes of Q2 and L3 are great, CR doesn't even need a fine-tune. I'd recommend dawnbreak or daybreak L3 or however it's called. Banger of a model.
>>101454907>benchesWorthless until I can actually get good responses.>12BLmao, the medium weight class is fucking dead.
so like...anyone actually get mistral-nemo working yet? I keep getting tensor shape error.
>>101455903Mikufag became jaded and it's shitting on every model that does well on benchmarks and the arena.
another nail in foss ai meme coffin
>>101455307ooba has the latest exl2 version retard
>>101455983I have tried all of these models and settled on gemma-2-27b still being the best. CR 35B is not usable, by the way.
>>101456078Sad that google has fallen so far that they spam generals on 4chan with fiver jeets to try and look relevant.
>>101456068You are correct. It does run slower in my case though
>>101456062por que?what benefit is there to having a portable, cloud-based LLM?
>>101456134Keep crying, miku.
>>101456137Exllama only has pipeline parallelism, vLLM is probably faster for multiple GPUs.
Pls no larp.Gemma has one benefit and that's speed. It's like saying 8B is faster, well yeah it is but it's also dumb. CR is the lowest I'd ever go for RP smut time. Otherwise 70B Q5 as the daily driver.I've tried various miqus, mixtrals, gemma2, various l3s, CR, CR+, Qwn1/2, abliterated models and all kinds of other junk. If you really think G2 is good you may be retarded.Just call it how it is, vramlet.
>>101456195>obsessed
>>101456195>cr 35b>better than gemmaobvious shitpost
>>101456180can you explain the difference on why vllm would be faster?vLLM also doesn't support as many sizes of quants
>>101456062OpenAI is dying, I'm only using claude 3.5 sonnet now, it's the only model actually good at code, there was gpt4 march 2023 that was also actually good back then but we can't use it anymore so...
>>101456223>>101456226Disprove with logs lads, I've posted my qwenny logs in this thread and last.
>>101456062mini version will be free btw
>>101456267Not local not interested
>>101456267Free if your information is worthless.
I actually can't run any of this so I just collect cards and wait for the day.
>>101456278>not shit enough not interested kek
>>101456236Because it has tensor parallelism? It doesn't run the GPUs sequentially, or something like that. It also uses the NCCL library makes better use of NVLink.
>>101456298Any of it?You could use a colab instance to run 8b at least, I'm pretty sure.I think koboldcpp has a ready made colab notebook in their repo.
I have a 3090 and a 850w power supply and just using a ryzenwhat gpu can i plop in my 2nd slot for cheap that can fit 850w??? I just want more memory.....................
>>101456343>what gpu can i plopyou must be 18 or older and not have used reddit within the past 6 months to post here.
>>101456343a 3090
>>101456078CR 35B is bad at instructions but best for prose
>>101456353dont i need a beefier psu?also i just realized my 2nd slot is just x2 lanes which sucks ass
>>101456365It's not usable because it does whatever it wants, usually just porn.
>>10145637510 strings a second don't need as much bandwidth as millions of vertex calculations 165+ times a second
>>101456261qwenny... uooh...
>>101456393hmm but i should probably just get a new motherboard that can be configured for x8 each since 3090s can get as cheap as $480 here nowadays
>>101456393NTA but what does NVlink do in this case if the 2nd gpu runs on x2 or x4?
>>101456341I like my privacy. But yeah I ran one of the weaker models and it still took 3 minutes to generate nonsense.
>>101456469I think both would run on the lowest bandwidththat's how SLI used to work
>>101456469It explodes.
>>101456436Are you in Taiwan or something?
>>101456499das stupidcan't it just use the x16 lanes
>>101456478>3 minutesI can run gemma-2-9b Q_4_M at ~5 t/s on my laptop CPU with llama.cpp
>>101456062Llama 8B fags BTFO
>>101456563*Q4_K_M
>>101456563I don't know. I just ran a random model I found on hugginface with koboldcpp.
>>101456617Did you offload all layers?
>>101456573not local, it will be censored to hell I won't be able to RP with my waifu, unironically :(
>>101456062Can't wait to try my character cards with it.>Tags: loli, bestiality, double penetration
>>101456062They really hit the wall didn't they? They are just throwing shit out there to stay relevant after their failure with training GPT-5.
27B SPPO when?
>>101456703I didn't fuck around with any settings because I didn't know what they do.
>>101456821this, that's the only thing I'm waiting at the moment
>>101456952Learn to prompt, retard.
>>101456859Fair enough.In the case of these layers, you want to have as many as you can inside your GPU's VRAM, so that's something you should change if you can.
>>101456062
>>101456062When they fuck are they gonna release gpt5? They still act as if they're still the king of the AI world, it's not the case anymore, claude 3.5 sonnat is now the big dog, they will die if they don't step up their game
>>101457145>they will die if they...release a bad model as gpt5, that would signal they can't innovate, that's why they won't, yet.
>>101457145GPT-4o was supposed to be GPT-5. It was so disappointing that they had to rebrand it under the GPT-4 moniker. "GPT-5" will only be released whenever it doesn't disappoint. Which will be never, so instead they will make up something about it being safer to do incremental updates from here on out and drop the GPT-x paradigm or some bullshit like that.If they released GPT-4o as GPT-5 like originally planned the entire LLM industry would collapse and start a new AI-winter.
Man. Chill. GPT-5 is in training. They'll BTFO everyone as soon as it comes out. The current competition is good. They'll be fine.
how do I RP? I have really have zero experience in doing RP and I feel like im missing out a lot...
mistral-nemo-8x12b-SPPO-orthogonal when?
>>101457233Orthogonal is a meme. You don't want the model to be unable to refuse completely when the story calls for it.
>>101457145after the election
>>101457214is this as good as LLMs will ever get?multi-modal and other frankenstein hybrids in the works would seem to suggest as much, unless some new factors are introduced
>>101457145>claude 3.5 sonnat is now the big dog,I also tried this model and I thought that the claude models were way more cucked than the chatgpt series. I was pleasently surprised when I realized it was the opposite, I'm not a murican so I tried to understand why Crooks registered as a republican even though he's a democrat. Bing chat told me to fuck off but Claude 3.5 Sonnet was willing to explain to me why (it was because he wanted to vote another republican than trump to weaken this side), that's when I realized that OpenAI is fucked if they don't react to that. Claude is less censored and better than OpenAI at the moment.
>>101456758
>>101456758good thing it will reject your utterly shit tastes.
Nemo GGUF https://huggingface.co/second-state/Mistral-Nemo-Instruct-2407-GGUF
>>101457320What? I thought people were reporting it didn't work.
>>101457306Ah, a man of culture, I see.>>101457318Nah man, I don't do scat.
>>101457352got a tokenizer issue when i tried to convert yeah, don't know how they did it, so might be scuffed
>>101457355pedoshit and bestiality is not that far from scat though.
>>101457276>is this as good as LLMs will ever get?A new architecture will BFTO transformers and a 7b model will be as good as gpt4o, that's really likely, there's no way transformers is the dead end of machine learning, no way
>>101457214>GPT-4o was supposed to be GPT-5. It was so disappointing that they had to rebrand it under the GPT-4 moniker.I also believe that aswell, they had no reason to just strive for a "slightly better version of gpt4", no one care about that stuff, and OpenAI used to go big everytime, that was unusual of them, maybe they reached their celling, but ClaudeAI definitely hasn't yet
>>101457320>LlamaEdge, powered by Rust and WasmEdge, provides a strong alternative to Python in AI inference.great, another ollama>The WASI-NN ggml plugin embedded llama.cpp as its backend.
>>101457320Does not work with kobold. Anyway someone on plebit report that the model It's coherent in novel continuation at 128K
>>101457276>is this as good as LLMs will ever get?define "good". are we going to get big leaps in "intelligence" in pure text gen? probably not. seems we've pretty much hit the limits of what simple scaling can provide. but there are lots of ways they can get better. there's dozens, hundreds of papers exploring ideas that would lower costs, increase control, etc. the engineering and compute can't keep pace with the research.i think we've only scratched the surface of multi modality. it WILL result in increased intelligence and usefulness. then there's combining LLMs with other algorithms to better approximate cognition.
>>101457276>is this as good as LLMs will ever get?If they keep pretraining their models will leddit and wokipedia, yeah, that's how far we can go. Especially for leddit, this place is hell on earth
>>101457276Until some breakout yeach that what we getting 2-5% better performance with each new model. Safety is for fags and the space is sadly full of them.
>>101457504>>101457504>>101457504
>>101457416Stop trying to make your scat fetish happen.It's not gonna happen.
>>101457534>pedo projections seems you love scat in the end of the day huh?
>>101457495>Safety is for fags and the space is sadly full of them.That's why it was never my dream to work on a giant company like google, you have to sell your sovl and your morals to work in such a cucked environnement
>>101457549>Arguing over whose fetish is worstShut up faggots
>>101452686There are few things that make me rage harder on 4chan, than /poltards who say "you're not welcome here," as if they seriously expect the person they're talking to, to give a flying fuck.Get off the Internet, Cleetus, and go back to fucking your pig out in the barn.
>>101457443I mean, if you're trying to minimize effort what else are you going to copy?Though the main dev doesn't seem to have any FAGMAN connections so this one is going to fail.
>>101457253I mean even if's not orthogonal, I'm assuming that an MoE 8x-12b will be around the same performance as early GPT-4 versions (censored)
>>101456758Do bestiality cards count if I'm playing the role of the dog?
>>101457966Just in case you don't already know them:https://www.dlsite.com/maniax/work/=/product_id/RJ202234.htmlhttps://www.dlsite.com/maniax/work/=/product_id/RJ182625.html
hey /g/ - new to llms, just setup ollama and openwebui. it's working ok but I noticed my GPU isn't being used at all. I have an Nvidia 1080ti with 12G of VRAM using dolphin-mixtral:8x7b. nvidia-cuda-toolkit-12.5.0 is installed. how do I get ollama to use my gpu?