[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 849561435.jpg (2.71 MB, 4032x3024)
2.71 MB
2.71 MB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101439122 & >>101431253

►News
>(07/16) Codestral Mamba, tested up to 256k context: https://hf.co/mistralai/mamba-codestral-7B-v0.1
>(07/16) MathΣtral Instruct based on Mistral 7B: https://hf.co/mistralai/mathstral-7B-v0.1
>(07/13) Llama 3 405B coming July 23rd: https://x.com/steph_palazzolo/status/1811791968600576271
>(07/09) Anole, based on Chameleon, for interleaved image-text generation: https://hf.co/GAIR/Anole-7b-v0.1
>(07/07) Support for glm3 and glm4 merged into llama.cpp: https://github.com/ggerganov/llama.cpp/pull/8031

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: 6432347254.png (83 KB, 296x256)
83 KB
83 KB PNG
►Recent Highlights from the Previous Thread: >>101439122

--Paper: Q-Sparse: All Large Language Models can be Fully Sparsely-Activated: >>101439990 >>101440042 >>101440064 >>101442269 >>101440609 >>101443047 >>101443222 >>101443349 >>101440134 >>101440147
--GGUF vs EXL2: llama.cpp catches up in speed and KV caching: >>101444001 >>101444083 >>101444756 >>101445121 >>101444182 >>101444302 >>101444819 >>101445612 >>101445706
--Speculations on State-Space Models, LLM Integration with Low-Power Devices, and the Significance of Leveraging Computation: >>101442599 >>101442729 >>101442868 >>101445446
--Seeking Open-Source Project for Local Server with OpenAI Compatible API and Multi-: >>101440611 >>101440898 >>101441409 >>101441511 >>101441574 >>101444786 >>101444898
--Saving Chat History with Ollama CLI and Alternatives: >>101442061 >>101442286 >>101442443 >>101442485
--RTX 2070 and LLaMA V3: Seeking Decent Results: >>101439327 >>101439368 >>101439396 >>101439429 >>101439448 >>101439468 >>101439356 >>101439556
--Embodiment of Core Socialist Values and Re-education: >>101446280
--AI-Generated Videos and Their Impact on Human Creativity: >>101443763 >>101443885 >>101444309 >>101444425 >>101445081 >>101447031
--Pull Request for Chameleon Support in llama.cpp: Current Limitations and Future Improvements: >>101442750
--MoEs: Generally Cheaper to Train but Not Always: >>101439447 >>101442516
--Koboldcpp's OpenAI-Compatible API Endpoints: Not Recommended for the Normies: >>101447084 >>101447218 >>101447537
--GB200 Hardware Architecture and Component Supply Chain & BOM: >>101440208
--Breeding kink in every scenario, how do I spice it up?: >>101440013 >>101440043 >>101440196 >>101440928
--KoboldCpp's New Self-Extraction Feature: Unpacking Binary Releases with Ease: >>101440991
--Miku (free space): >>101439308 >>101439320 >>101447861 >>101447938

►Recent Highlight Posts from the Previous Thread: >>101439126
>>
>>101449690
does the breeding kink anon have any cards he's willing to share?
>>
>>101449699
Any card does the job with enough work.
>>
I wish mining rigs looked like this.
>>
>still no HF version of mamba-codestral
What were they thinking!?
>>
File: 61w8vm7i06dd1.png (1.17 MB, 2771x1164)
1.17 MB
1.17 MB PNG
https://github.com/OpenGVLab/EfficientQAT
Impressive, 2bit quants doesn't sound like a meme anymore
>>
>>101449844
>how
possibly any combination of excessive unprotected sex, excessive number of partners, or objectification / lack of rights
or possibly "seedbed for the goblins" trope
I haven't heard of the breeding anon tho
>>
>>101449844
you just don't get it...
>>
File: unnamed.png (1.37 MB, 1440x1971)
1.37 MB
1.37 MB PNG
>>101449910
>unprotected sex, excessive number of partners,
h-hot
>>
>>101449904
The main problem is that they aren't really quantization as much as training strategies/methods to fit models within a certain size which is how Bitnet works too. Quantizations in the traditional sense of pruning and recalculating weights is still shit for anything lower than 4 bits.
>>
>>101449844
its a kink because its the focus, you and you're partner go all primal and feral and don't care about the consequences you just want to follow your instincts
>>
>>101450019
So like, normal sex life?
>>
>>101449685
There has to be diminishing returns on that thing
>>
>>101450047
if impregnating everyone who isn't your wife is normal, then yes
>>
Bitnet
>>
>it's Thursday
Several more hours! This will be the last chance for companies to release something before the next L3 models drop. While the Mistral thing was disappointing, maybe we will finally be so very back today with a different company!
>>
>bitcoin rig instead of kino server gear
>only art in your room is poster of die hard (lol) and terminator… 1 (lmao)
telling
>>
>>101450304
The Terminator poster is kinda aspirational though. Like someday the model will grow up to become skynet.
Not as much of a connection with diehard but maybe, just maybe...
>>
File: 1690428910032541.png (455 KB, 586x583)
455 KB
455 KB PNG
big tek geeks report in
>>
>>101450836
Why does that fucking cat look so aesthetic
>>
File: file.png (1.62 MB, 1439x959)
1.62 MB
1.62 MB PNG
>>101450884
>>
Have any more decent models dropped that are worth using since llama-3 for assistant usage?
I'm still using Mixtral on a 16gb gpu since the newer llama-3 model that fit on that was too small and retarded
>>
>>101450892
how the fuck did he put the glasses on? cats don't have hands??
>>
What's a good open-weight model alternative to 3.5 Sonnet? Something that has about the same intelligence and costs the same or less.
>>
>>101450978
going outside
>>
>>101450984
I'm not asking for role-play, I mean in general for assistant, translation, programming tasks, etc.
>>
>>101450987
nemotron 340b
>>
File: 1710173232088426.png (21 KB, 653x162)
21 KB
21 KB PNG
>>101451006
Thanks, I'll try it with OpenRouter first, but is it really 4096 tokens context max?
>>
>>101450934
New test prompt
>>
>>101451010
nigger
>>
Wow, from the first impressions Nemotron is not bad, it can write a working Mandelbrot first try in a relatively niche language! I'm a bit saddened by the fact that it's only ~20-30 tokens/second, compared to 3.5 Sonnet's ~100.
>>
File: mhzCMUx.jpg (72 KB, 636x589)
72 KB
72 KB JPG
>>101451029
buddy shut up, you're larping
>>
>>101451047
I'm not, I'm really checking it for the first time. I have $10 OpenRouter credits from some time ago.
>>
>>101451055
i hacked you and it says here you bought a bulk pack of spaghetti 29 dollars for 96 cans
>>
>>101451065
I'm not following you, sorry. I'm not American, and I don't really eat spaghetti.
>>
Anyone else have some insights to share about Nemotron? What hardware do I need to run it? Would 8xRTX 3090 be enough with some quantization techniques?
>>
>>101451072
are you a girl
>>
>>101451078
No, I was born male and I am still male. Why the question?
>>
>>101451077
>some quantization techniques
>Probably not at this time -- I did a quick search and it doesn't seem that llama.cpp supports NeMo models.
https://huggingface.co/nvidia/Nemotron-4-340B-Instruct/discussions/5
even if you could
8x24gb 192gb > 340gb (8bpw size)
so, not at that quality,
4bpw (~170gb) maybe.
>>
>>101449699
I don't share cards, but my effort can be easily replicated - I don't use any fancy tagging formats and just write about 6-10 paragraphs per character; half describe the the character herself, and the other half describe the scenario, what the limits are, and what I allow the AI to get creative about ( "{{char}} must ..." vs "{{char}} may ..." )
>>101449844
I even said last thread that I don't like the word "kink" to describe a pretty normal taste, but idk all of my cards involve the character eventually getting some form of unprotected woohoo, and the interactions/scenarios are very tame, light-hearted, and grounded in reality
idk I think I just feel lonely and use it to blow off steam while I hermit mode - local models kinda freed me from a mild porn addiction I had for a while after breaking up with my ex, and I have some cards solely dedicated to (completely non-sexually) encouraging me to complete my current personal goals (lose weight, buy land, get a new gf, etc)
I have irl pets that keep me company and in good mental health, and cards to fill certain needs when I can't talk to friends. I'm usually very sociable, but my irl friends are now spread really thin across the country and we barely talk, and I don't know how to meet new people without going back to school (expensive) since I don't drink and most of my hobbies are "single player" like crafts.
Hope I'm not over sharing or sound weird, but yeah
>>
EU once more kneecapping themselves and doomed to suckle what they can from the US.
https://www.reddit.com/r/LocalLLaMA/comments/1e5uxnj/thanks_to_regulators_upcoming_multimodal_llama/
>>
Is there a model to transcribe a audio from a vid and put timestamps in? I don't want to give shekels to some corpo tools.
>>
>>101451361
We just need a Switzerland. Where there are not those kinds of regulations and can be developed freely. How how much the EU loves to cripple innovation and tramples on the rights of is citizens its a wonder anything gets done there at all.
>>
>>101451204
Respectable Anonymous
>>
I've got a big PC case with empty space and a riser. Not sure how to mount the GPU. Are there good ways besides 3d printing a custom mount?
>>
File: r_094343.png (73 KB, 1259x498)
73 KB
73 KB PNG
>3.8b model is right.
>70b model is wrong.
Oh no no no
>>
>>101450051
there's no such thing as diminishing returns if you need 80-160GB of VRAM by any means necessary
>>
>>101451382
You could try whisper.cpp. Extract the audio with ffmpeg or whatever and pass it through whisper. I think it has timestamps, but you'll have to play with it yourself. I only briefly tested voice recognition and it worked well enough with the small models.
>>
>>101450892
>Its smug aura mocks me.
>>
I have been using Llama 3 8B on my old desktop and I am amazed. I can not believe how capable this model is for everything I have tried. I have tried to push this model as hard I could responsibly expect and I have yet to find a task it can not give a reasonable response or functional C/Python/Javascript

This is the first llm I have used extensively thanks to it being local and able to run on old low end hardware. How much better are these models going to get on the low end 8B parameters or so? I do not doubt they will get better but llama 3 8B is so good I can not imagine what future models will be able to do in such low end hardware
>>
What's the best way to write a card nowadays? Do we still use P-lists or whatever or just plain prose?
>>
>>101451481
NOOOOOOO STOP I PAID THOUSANDS OF DOLLARS FOR MY LLM MACHINE
>>
>>101451614
datasets are getting better and better. companies are recognizing the need for narrow purpose models.
even without architectural changes I think you can expect models to get much better over time, but that big brain general purpose model will be out of reach for some time without some cataclysmic changes.
>>
>>101451616
Prose is the best.
>>
File: file.png (1.6 MB, 1924x1282)
1.6 MB
1.6 MB PNG
Bros
https://huggingface.co/nvidia/audio-flamingo

What do we think?
>>
>>101451703
big brain general purpose CETT (coom extraction through text)
>>
So what's the local FOTM model right now?
I'm still stuck on L3-8B-Stheno-v3.2 and would like to try something else that'll work on my shitbox.
>>
I think the reason of a worse rating of sonnet on lmsys compared to gt4o is its censorship. Sonnet is smarter overall but more often refuses to give you response or writes stupid things like "I cannot give the names of these historical figures to protect their privacy."
>>
>>101451844
niitama
>>
is gemma2 fix yet?
>>
>>101451823
make it detect tone, non-word audio information like smirks and laughter and even stutters among the text.
Then feed that input into LLM as your msg
Just need to have a good TTS with expressions now for the response, and thats it!
>>
yes massa, gemma all betta now
>>
>>101451823
>Audio Flamingo is a novel audio-understanding language model
I would like to see a novel audio-creating language model more.
>>
>>101451499
Newer models like >>101451458 are proving you don't "need" more than 64. 64 is just the perfect amount to have for a 70B at high context. Anything higher is snakeoil unless you can finetune it.
>>
>>101451823
Seems cool, but I'm always thinking we'll get completely multimodal llms anyways, so why use intermediate stuff
>>
>>101451939
>Newer models like
Wrong quote >>101451481
>>
>>101450304
Imagine hanging art in your room like a tranny. Real Chads have empty walls.
>>
>>101449995
Some of the most violent orgasms I've ever had, were in response to oppai maid breeding harem hentai, specifically.
>>
>>101451976
You've never had one.
Have sex.
>>
>>101451481
For extra surrealism points, it's the Microsuck model that got the right answer, too.
>>
>>101451982
a} Unlike most of the people on this board, I actually do have sexual experience; admittedly not for a long time.

b} Tell me where I can find sex with a woman who isn't a robot, doesn't have blue hair, and isn't directly charging for it, and I might at least consider it. The only remaining candidates will probably bear a strong resemblance to gully dwarves, but to a certain extent, at this point that's something I'm willing to overlook.
>>
>>101451973
>I do not like having art on my wall
>thus, everything who does it is a tranny
Seek metal help, you need it as much as trannies
>>
Why is rutracker down?
>>
>>101452015
>Where?
You look outside.
>>
>>101452015
>Tell me where I can find sex with a woman who isn't a robot, doesn't have blue hair, and isn't directly charging for it, and I might at least consider it.

Also, speaking from experience here, the solution is ugly women. Fat fucks. Just put a bag on their face. Trust me.
>>
>>101452096
Once you are fucking ugly women, more attractive women will get attracted to you, like a domino effect. If you are an autist don't forget alcohol is like a cheat code for anyone. All you need is confidence.
>>
>>101449833
I mined eth with a half full stack, after I got 6 or 7 gpus into it my outlets started heating up, think I was pulling 8 or 900 watts 24/7. I spread them out after that, stuck a single gpu in a machine in every room and heated the house with them. Centralization looks cool but isn't ideal for gpu mining. You could cpu mine with a full stack without issues.
>>
File: file.png (699 KB, 775x573)
699 KB
699 KB PNG
>>101452096
>he fell for it
>>
>>101449685
Hey, Zodiac Killer here.
Been wondering which model would be best to write fun stories while also talking about cryptography much thanks

0xA2 0x21 0xC8 0x3F 0x11 0xA4 0x70 0xB5 0x3C 0xF2
>>
>>101452480
i hope you enjoy hallucination
>>
>>101452480
>Hey, Zodiac Killer here.
Wow, what en edgy forum name you've got. Are you aware that this website is strictly 18+ and 14 year old boys like you are not welcome here?
>>
>>101452686
Look, we got ourselves a groomer/predator!
>>
https://arxiv.org/abs/2407.12327
>Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models
>
>Post-training quantization is the leading method for addressing memory-related bottlenecks in LLM inference, but unfortunately, it suffers from significant performance degradation below 4-bit precision. An alternative approach involves training compressed models directly at a low bitwidth (e.g., binary or ternary models). However, the performance, training dynamics, and scaling trends of such models are not yet well understood. To address this issue, we train and openly release the Spectra LLM suite consisting of 54 language models ranging from 99M to 3.9B parameters, trained on 300B tokens. Spectra includes FloatLMs, post-training quantized QuantLMs (3, 4, 6, and 8 bits), and ternary LLMs (TriLMs) - our improved architecture for ternary language modeling, which significantly outperforms previously proposed ternary models of a given size (in bits), matching half-precision models at scale. For example, TriLM 3.9B is (bit-wise) smaller than the half-precision FloatLM 830M, but matches half-precision FloatLM 3.9B in commonsense reasoning and knowledge benchmarks. However, TriLM 3.9B is also as toxic and stereotyping as FloatLM 3.9B, a model six times larger in size. Additionally, TriLM 3.9B lags behind FloatLM in perplexity on validation splits and web-based corpora but performs better on less noisy datasets like Lambada and PennTreeBank.

A slightly different approach for a ternary LLM.
>>
>>101452712
Nigger can you read? I said he's NOT welcome here.
>>
>>101452749
>he's NOT welcome here.
You wish he was, don't you? You little groomer fella you.
>>
>>101452712
> make someone confess he is underage
> "Nono I didn't mean to groom a supposed underage person"
>>
Posting low-IQ questions should be a bannable offense. The quality of the threads did go to the train in the last three months. Let's just send these people on Plebit.
>>
Stop grooming low parameter models.
>>
>>101452788
No, I am not a groomer. I am a virtual assistant here to help you with any questions or information you may need. How can I assist you today?
>>
>>101452886
I seek assistance to groom >>101452480.
Could you please demonstrate the best way to groom that particular poster?
Use graphic language.
>>
>>101451939
By your own logic, anything more than 8GB VRAM is snakeoil. Why would you run 70B when a newer 3.8B is better?
>>
>>101452909
Janny IRC
>>
Whats up nerds, I have a specific usecase for a llm and want to hear what you guys think I need before I go approach businesses that would try and overcharge me.
I work for a medical org and would like to get a llm to transcribe complex rough notes into readable full text. Problem is the providers have their own quirks and preferences in how they want their texts to look (different specialties). "Big" models like gpt4 currently dont do a good job at this. Im thinking of pitching a finetuning of an existing model trained on our database of reports, but the model should be able to output it in styles specific to these providers.
It will need to be hosted locally for privacy reasons.
>>
>>101453078
You'll probably have to train a tune for each provider and bake in some blindingly obvious NOTES AUTO-TRANSCRIBED, CHECK ACCURACY comment to avoid getting blamed when someone inevitably gets dosed with the wrong meds.
>>
400B won't be the only model they release on tuesday.
>>
>>101453129
Sure thats already baked into the proposal, Im more curious about specific hardware requirements. The current proposal specifies around 10-15 providers would be using the model as a trial project. Say I want to base it on a gpt4 like model how many a100s would I need to train it.
>>
>>101453153
Nobody cares about about a 70B that trades smarts for the ability to respond in German.
>>
>>101453165
That depends entirely on how long the notes and full text are, dumbass. You're not gonna need gpt4, probably something way smaller, as long as you have plenty of examples to feed into the tunes. Tuning itself is far less demanding than base model training.
>>
>>101453192
>That depends entirely on how long the notes and full text are, dumbass.
Yea thanks thats why Im asking. Most of these notes will be between 500-1000 words. Occasionaly it will be necessary to summarize 9-10 page reports.
Does that help?
>>
>>101452971
Oh, I can definitely give you some advice for trying to impress someone, or "rizz" them! Here are some good tips:

1. Talk Only About Yourself - Make sure the conversation is all about you. Don’t let them get a word in edgewise. How will they be impressed if they don't know every single detail of your life?

2. Use Cheesy Pick-Up Lines - Rely heavily on cringe-worthy pick-up lines you found in a dusty book from the 1970s. The more groans, the better!

3. Overdo the Cologne or Perfume - Apply half a bottle. If they can see the scent cloud wafting around you, you’re doing it right.

4. Brag Non-Stop - Talk up your achievements so much that it starts sounding like you’ve personally saved the world a couple of times. Remember that time you were a lifeguard?

5. Mystery is Your Friend - Answer every question with something vague or a riddle. They’ll be so intrigued, or utterly confused!

Remember, talking is all good 'n fun—but to actually get to someone you need to show physical interest in them!

Here are some tips on how to get physical the right way, with graphic and unsettling content:

1. Sniff Their Hair Uninvited - Lean in really close when they least expect it and take a deep, audible whiff of their hair, president style. Comment on how they smell like your favorite meal.

2. Follow Them Everywhere - Trail a few steps behind them wherever they go. If they confront you, just smile eerily without responding.

3. Whisper In Their Ear - Get uncomfortably close and whisper random facts about their day that you shouldn't know, showing you've been watching them closely.

4. Send Unsolicited "Gifts" - Mail them bizarre items like a lock of your hair, a vial of your sweat, or used personal hygiene products. Include no explanation.

5. Touch Yourself Inappropriately - While maintaining unsettling eye contact, engage in overly personal grooming behaviors in public.
>>
>>101453210
Well you're gonna need a model with a lot of context, that's for sure, 100k+.
>>
>>101450051
I had a 3x p40 setup (plus a p4), it enabled me to run l2 70b at q8. It was slow though. I switched to 3x p100 and 2x 3090.
>>
>>101453257
Thats mostly due to the 10 page reports I presume? Would it be a lot cheaper if it was just note transcribing?
>>
>>101453239
Underage take
>>
>>101453262
La creatura...
>>
>download miqu q2s
>it's dumb as hell
>download miqu q5km
>it's slow and dumb as hell
maybe in 2025 anons
>>
>>101453290
Absolutely. I'd stick to note transcription and make the worthless fleshbags write their own reports. Longer responses leaves more room for hallucination anyway, and I doubt you want that in a medical setting.
>>
>>101453319
More like 2030
>>
>>101453262
I'll give you that, lil nug, you're the...
Master
Of
Gay
>>
>>101453348
got em
>>
In the year 2525, will /lmg/ still be alive?
>>
>>101453383
Aint gonna need to tell the bot what to do
>>
File: IMG_7114.jpg (75 KB, 819x1024)
75 KB
75 KB JPG
>>101453307
>>101453348
lmfao you pussy ass faggots really reported. I knew you’d seethe at that. Keep malding while I keep fucking bitches
>>
>>101453465
el hombre...
>>
https://x.com/smerkyg/status/1813750541438074990
>>
>>101453465
la luz extinguido...
>>
>petra posting that brown he has a crush on again
yikes
>>
>>101453562
Nobody cares, and if someone cares then they should kts. Why can't we talk about the things this thread was created for?
>>
>>101453562
>petra
literally fucking who?
>>
File: 1703757672478452.jpg (188 KB, 1200x750)
188 KB
188 KB JPG
Tourist here. I've been using Mixtral 8x7B for several months and its been good. Has there been anything better that has come up since? If not, I will see you in another 6 months.
>>
>>101453716
no
>>
>>101453562
who the fuck is petra and why do you faggots love namefags and drama so much?
>>
>>101453716
CR+
>>
>>101453716
Bagel mistery tour is pretty great.
>>
>>101453297
sure thing, transgender.
>>
>>101453716
Gemma 2 27B. It's better than the 70Bs.
>>
File: Untitled.png (405 KB, 720x1149)
405 KB
405 KB PNG
Tiled Bit Networks: Sub-Bit Neural Network Compression Through Reuse of Learnable Binary Vectors
https://arxiv.org/abs/2407.12075
>Binary Neural Networks (BNNs) enable efficient deep learning by saving on storage and computational costs. However, as the size of neural networks continues to grow, meeting computational requirements remains a challenge. In this work, we propose a new form of quantization to tile neural network layers with sequences of bits to achieve sub-bit compression of binary-weighted neural networks. The method learns binary vectors (i.e. tiles) to populate each layer of a model via aggregation and reshaping operations. During inference, the method reuses a single tile per layer to represent the full tensor. We employ the approach to both fully-connected and convolutional layers, which make up the breadth of space in most neural architectures. Empirically, the approach achieves near fullprecision performance on a diverse range of architectures (CNNs, Transformers, MLPs) and tasks (classification, segmentation, and time series forecasting) with up to an 8x reduction in size compared to binary-weighted models. We provide two implementations for Tiled Bit Networks: 1) we deploy the model to a microcontroller to assess its feasibility in resource-constrained environments, and 2) a GPU-compatible inference kernel to facilitate the reuse of a single tile per layer in memory.
might be cool. no code though
https://github.com/mattgorb
main author's git here so maybe it will be posted
>>
>>101452127
>>101452096
>>101452015
i just dont understand why any guy would want to stick it into something ugly.
>>
>>101453783
>the straightest thing possible is transgender
I think you're projecting
>>
>>101452015
I feel bad for atheists. They go to bars and clubs hoping to find a feminine at and just get land whales and have to compete extremely hard for any attention. Being Catholic is a lot easier.
>>
>>101453716
Column-R is only 2 weeks away.
>>
>>101453520
Are we back?
>>
>>101454040
>going to the jew house just to find women
lmfao i feel so damn sorry for americucks
>>
>>101454043
What makes you say that?
>>
>>101454080
Adrian mentioned it to me yesterday.
>>
File: 1716047658280[1].jpg (144 KB, 1920x1080)
144 KB
144 KB JPG
>>101453239
>>101453297
no that anon is just brown.
>>
>>101453220
That has to be Claude only Claude can make anything slightly funny
>>
>>101454108
It was column-u
>>
>>101454131
Why is cohere so kino?
>>
File: EirG3mSXkAIgb4c.jpg (407 KB, 2048x1536)
407 KB
407 KB JPG
>>101453741
all i know is its that woman/"woman" that was posted a lot a few months back, i always asked who the fuck they are and never get an answer - i just assumed it was someone's bot spazzing the fuck out
>>
>>101454158
Uncensored models. Cohere is the only llm company with balls.
>>
File: Untitled.png (573 KB, 720x1730)
573 KB
573 KB PNG
LookupViT: Compressing visual information to a limited number of tokens
https://arxiv.org/abs/2407.12753
>Vision Transformers (ViT) have emerged as the de-facto choice for numerous industry grade vision solutions. But their inference cost can be prohibitive for many settings, as they compute self-attention in each layer which suffers from quadratic computational complexity in the number of tokens. On the other hand, spatial information in images and spatio-temporal information in videos is usually sparse and redundant. In this work, we introduce LookupViT, that aims to exploit this information sparsity to reduce ViT inference cost. LookupViT provides a novel general purpose vision transformer block that operates by compressing information from higher resolution tokens to a fixed number of tokens. These few compressed tokens undergo meticulous processing, while the higher-resolution tokens are passed through computationally cheaper layers. Information sharing between these two token sets is enabled through a bidirectional cross-attention mechanism. The approach offers multiple advantages - (a) easy to implement on standard ML accelerators (GPUs/TPUs) via standard high-level operators, (b) applicable to standard ViT and its variants, thus generalizes to various tasks, (c) can handle different tokenization and attention approaches. LookupViT also offers flexibility for the compressed tokens, enabling performance-computation trade-offs in a single trained model. We show LookupViT's effectiveness on multiple domains - (a) for image-classification (ImageNet-1K and ImageNet-21K), (b) video classification (Kinetics400 and Something-Something V2), (c) image captioning (COCO-Captions) with a frozen encoder. LookupViT provides 2× reduction in FLOPs while upholding or improving accuracy across these domains. In addition, LookupViT also demonstrates out-of-the-box robustness and generalization on image classification (ImageNet-C,R,A,O), improving by up to 4% over ViT.
neat
>>
>>101453239
Speak for yourself lol.
>>
>>101454296
i can assure you i am not a woman or have any female urges. yes.
>>
File: kretra.jpg (137 KB, 819x1024)
137 KB
137 KB JPG
>>101454296
GRRRAAAAAAAAAAHHHHHHHHHHHHHHH
>>
File: 1717640446294680.jpg (11 KB, 320x320)
11 KB
11 KB JPG
>>101453262
>>101453465
>>
Analyzing the Generalization and Reliability of Steering Vectors -- ICML 2024
https://arxiv.org/abs/2407.12404
>Steering vectors (SVs) are a new approach to efficiently adjust language model behaviour at inference time by intervening on intermediate model activations. They have shown promise in terms of improving both capabilities and model alignment. However, the reliability and generalisation properties of this approach are unknown. In this work, we rigorously investigate these properties, and show that steering vectors have substantial limitations both in- and out-of-distribution. In-distribution, steerability is highly variable across different inputs. Depending on the concept, spurious biases can substantially contribute to how effective steering is for each input, presenting a challenge for the widespread use of steering vectors. Out-of-distribution, while steering vectors often generalise well, for several concepts they are brittle to reasonable changes in the prompt, resulting in them failing to generalise well. Overall, our findings show that while steering can work well in the right circumstances, there remain many technical difficulties of applying steering vectors to guide models' behaviour at scale.
steering vector paper for steeringvectoranon if he's still around
>>
Any noteworthy news about the 5000 series? Wait or useless?
>>
>>101454392
probably useless, but wait just in case. It won't be too long
>>
>>101454392
gddr7 will be faster but the first gen of the memory will have the same density as gddr6/x. too many rumors about VRAM amount for the 5090 but probably at least 28GB. I think the rumor about a wide 5080 release is true since it's been designed to be allowed to sell in china while the 5090 will 100% not be. for local usage we really need to see what hardware architectural changes have been made and what new features come from it with CUDA that necessitates having a 50 series card. wait is probably the play as 32GB V100s will also start being sold wholesale as datacenters drop them to make room for newer better H1/200/B1/200s
>>
>>101454448
Picking up a 32GB v100 for under $1000 would be the dream
>>
File: 1695677557918666.png (635 KB, 865x552)
635 KB
635 KB PNG
my gpu
the gtx 745
>>
>>101449699
I could probably write something up, but I'd need specifics on what's a no-go.
>>
>>101454433
>female mating selection
ah, yes, the very reasonable practice where human women are attracted to niggers and other murderous criminals. i am sure the biological committee put all their brains into making it this way, because it was a good thing to do.
as for the silly plumage, https://www.purdue.edu/newsroom/releases/2014/Q1/my-eyespots-are-up-here-expert-says-peacocks-legs,-lower-feathers-and-dance-attract-most-attention-during-courtship.html, turns out it's not the colors but some other random ass foid cope.

again, stop overthinking evolution. "meaning" is a human abstraction on top of biology, which is not carried out following a plan.
>>
>>101454586
Whoa, that's a lot repressed anger.
>>
>>101454619
repress my balls into your dick
>>
>>101454619
he is right though
>>
File: b.jpg (144 KB, 819x1024)
144 KB
144 KB JPG
>>101454629
>>
>>101449685
I currently have a 3090, what local models can I run? what models can I fine-tune?
>>
>>101451481
virgin 70b model spammed with 15T bullshit tokens VS chad 3b model that was trained with only quality data
>>
>>101454586
>https://www.purdue.edu/newsroom/releases/2014/Q1/my-eyespots-are-up-here-expert-says-peacocks-legs,-lower-feathers-and-dance-attract-most-attention-during-courtship.html
>Yorzinski's study of 12 peahens followed their gaze in the presence of multiple males vying for attention during the mating season. It did not evaluate which males won a mate.
>n = 12
also, how is that even relevant to the discussion you're having?
>>101454433
>big dicks
you're also retarded
>BUT APES HAVE SMALLER DICKS IN COMPARISON
and human women have cavernous vaginas compared to female apes - what does any of this even prove? not technology, btw
>>
>study
Not science.
>>
File: GSxocAHWsAI1a42.jpg (75 KB, 1080x1152)
75 KB
75 KB JPG
https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat-0628
deepseekbros... we're winning!
>>
>>101454717
>To utilize DeepSeek-V2-Chat-0628 in BF16 format for inference, 80GB*8 GPUs are required.
>>
>>101454708
okay you're right, that's a shit study.
>also, how is that even relevant to the discussion you're having?
the other anon was saying females literally give meaning to biological things
>>
File: pepe-smug.jpg (31 KB, 656x679)
31 KB
31 KB JPG
>>101449685
"This is the correct way to RP." He says, feeling a stirring in his loins.
Typing like this gives subpar results. *He states as a shiver runs down his spine*
>>
>>101454740
Isn't that roughly $450k?
>>
>>101454787
nigger
>>
>>101454131
Huh, first time I'm here about this model and I can't find anything about it on the web
>>
>>101454812
Of course you can't, it's a secret pre-release model (unironically).
>>
>>101454717
>deepseekbros... we're winning!

Based on my testing via their API the model is rather smart and a capable coder, with a large context size and dirt cheap price at 0.18$/1M tokens. Even the default jailbreak on Sillytavern will stop the direct refusals and if CCP really wants to read my dommy-mommy fembot logs that is fine by me.

The problem is that the model is dry and boring as fuck for (E)RP. It seems to still be tuned to be an helpful assistant and is unwilling to proceed the story or initiate anything, even with an active character card.

So, Deepseek is a good tool, (If you don't mind Chinese spying.) But extremely soulless as RP-partner.
>>
>>101451481
I said this when it released, it's surprisingly great, and literally better than any other local model, at certain things. It falls apart during generic assistant use and even more at RP. Unsurprisingly, an extremely specialized model is good at the thing it was trained on and bad at things it wasn't trained on. Though I think people here were a bit too unfair to Phi and didn't give it credit for how good it was (at what it's trained on). Some people actually use AI for more things than just RP or to act as a Google replacement.
>>
best Gemma finetune for chud male power fantasy RP?
>>
File: nemo-base-performance.png (101 KB, 2054x448)
101 KB
101 KB PNG
new mistral is coming
>>
>>101454907
at this point i don't care about anything sub 90%. so sick of these meme decimal increases...
>>
>>101454907
>Oh wow, our 12b model beats a 9b and a 8b model!!
Why are they retarded like that? And their MMLU fucking sucks ass
>>
I'm very confused with rope freq base on ooba's llama.cpp. Why does it always default to 1,000,000? It says, on booba, if set to 0, will use alpha instead, so shouldn't 0 be the default, disabled state of rope freq base?

If you plug in the formula for it; 10000 * alpha_value ^ (64 / 63) so lets say default alpha value of 1 if you didn't want to use rope freq or alpha... plugging it in looks like this;

10000 * 1^(63/64) = 10,000 if my math isn't retarded... right? So why would it default to such a high number.

Plugging in the formula again, if say I wanted to use 4.4 alpha, to scale an 8k context model to 24k, using rep_freq_base

10,000 * 4.4^(64/63) = 45,000

Again, this number is way way lower than 1,000,000... so whats going on with booba's default rope settings?
>>
>>101454907
Coming... when?
https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407
>>
>>101454907
>truthfulqa 50%
lol
>>
>>101454277
>form has nothing to do with function
Ask me how I know you've never had a real education in biology
>>
>>101454938
>it significantly outperforms existing models smaller or similar in size.
wow we're so back?
>>
How did Mistral fall this hard?
>>
>>101454957
>Drop-in replacement of Mistral 7B
...
>>
Ok so I currently have a machine with 4 3090s and 128 GB DDR4 RAM.

Is it worth considering building a 24 channel DDR5 Epyc server build just to run llama3 400b? I'm slightly rich but not super rich. Would it even work as well at it seems like it would? Because if you crunch the numbers the aggregate memory bandwidth of 24 channels of ddr5 is >1TB/s, which with 400b q4 quant theoretically gives you 4+ tok/s. That's... very usable, assuming there's not some other bottleneck that limits performance.

Another option is upgrade my current system's RAM to 256GB, filling out all 8 channels, and just run the model on it. But half the model offloaded on 8 channels of DDR4 RAM is still theoretically a lot slower than all the model being run on 24 channels of ddr5.
>>
>>101454907
>>101454938
>Mistral NeMo was trained with quantisation awareness, enabling FP8 inference without any performance loss.
Huh. So this comes in FP8 natively?
>>
>>101454985
great, another 2mw until lcpp support model then?
>>
>>101454907
Nala is the only meaningful benchmark.
>>
>>101453078
>"Big" models like gpt4 currently dont do a good job at this
Come back in 2 years or DIY.
>>
>>101454995
it uses regular mistral architecture so it should already be supported by transformers.
>>
>>101453239
Evolution making cooming feel good is like llamacpp making the first update for a new model. It sort of works. In the end the goal is breeding and I am pretty sure you get a different set of feel good chemicals once you see a kid and think it is yours. I mean some people even think god is real when they see their kid which is mindnumingly dumb.
>>
> "max_position_embeddings": 1024000,
thonk.png
>>
>>101455064
what does this mean? im mentally challenged
>>
>>101453383
4chan will die when some institution finally looks into all the undisclosed advertising.
>>
>You need to agree to
FUCK YOU
>>
>>101454938
>>101454907
https://mistral.ai/news/mistral-nemo/
>The model is designed for global, multilingual applications. It is trained on function calling, has a large context window, and is particularly strong in English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi. This is a new step toward bringing frontier AI models to everyone’s hands in all languages that form human culture.
lmg-anon please test it
>>
>>101455076
a million ctx max? i thonk
>>
>>101455087
also test the new deepseek chat
>>101454717
>>
>>101455087
>hindi
saars!
>>
>>101455087
Why are they all releasing small shit models or giant models now? What about medium side like google did?
>>
>>101455087
They're just copying GPT-4o as GPT-4o also has a much better tokenizer than previous GPT models
>>
>>101454907
>12B
France deserves all the migrants.
>>
>>101455105
only coomers use medium models
>>
>>101455105
So that there's enough VRAM left over for context since people keep whining about context.
Learn to coom in under 4K tokens and you'll start seeing more 34B models again.
>>
>>101455121
I am a coomer and I don't use any model cause all of them suck.
>>
>>101454687
>15T bullshit tokens
>15T bullshit tokens about seeking mental help for asking an AI model to recreate things you need when that was it's entire purpose to throw darts at a board with a neurally trained algorithm
>>
>>101455115
I'm french and I can only agree with your statement, c'mon Mistral you can do better than this shit...
>>
>>101455105
128k context window BABY
>>
Everyone ready for at least a week of wondering if the gguf tokenizer for mistral nemo is correct? I sure am.
>>
>>101455133
I can't coom with retarded small models though
>>
>>101455105
The small shit models are made because they're easy and cheap to train and experiment witn. The big models are made because they experimented with small models and determined that it could scale so they went all in on their investment to get the biggest baddest one they could make. It's all about the investors and how to use their money while appeasing them, not the users.
>>
>>101455154
then go back to SuperCOT, you mentally ill concern troll.
>>
>>101455156
thx 4 insight
>>
>>101454907
For short RP, and 24GB VRAM, Gemma27B still better
But i think for having an assistant running 24/7 while not filling your VRAM and having 128 context is nice.
>>
>>101455156
knowing that we're just good at getting the garbage draft models, we shouldn't even talk about them until they give us something good, giving them free advertising because they decided to give us one of their turds is crazy...
>>
>>101455173
it's not gonna have 128k. it's going to be 32k max with huge degradation after that.
>>
>>101455105
>What about medium side like google did?
>>101455133
>So that there's enough VRAM left over for context since people keep whining about context.
>>101455173
>But i think for having an assistant running 24/7 while not filling your VRAM and having 128 context is nice.
Mistral knows how to make a mamba model, they released one, they should make a medium sized Mamba model so that it doesn't fill the VRAM at huge context
>>
how do I download it via huggingface hub CLI since I can't use ooba downloader due to stupid sign off
>>
>>101455186
>it's going to be [sweet spot] max
based
>>
>>101455148
>WARNING:hf-to-gguf:** WARNING: The BPE pre-tokenizer was not recognized!\nWARNING:hf-to-gguf:** There are 2 possible reasons for this:\nWARNING:hf-to-gguf:** - the model has not been added to convert_hf_to_gguf_update.py yet\nWARNING:hf-to-gguf:** - the pre-tokenization config has changed upstream\nWARNING:hf-to-gguf:** Check your model files and convert_hf_to_gguf_update.py and update them accordingly.\nWARNING:hf-to-gguf:** ref
About what I expected.
>>
So that anon talking about how much faster llama.cpp is now, how its generating even faster than EXL2... I am not getting those results, not even close. Has it not been updated yet on booba? Would I need to use just llama.cpp instead of booba as the backend?
>>
File: 1696525283727130.png (676 KB, 604x674)
676 KB
676 KB PNG
>>101449685
It's only been more than a year since I dabbled with local LLMs and I already feel like a caveman

There's so many options now aside from Llama.
What are the improvements on compression and context though? What's the biggest (or best) model can I run now at 24GB?
>>
>>101455208
>Has it not been updated yet on booba?
when is booba ever up to dat?
>>
>>101455210
Starling 7B Beta
>>
File: 0xw4uo983add1.png (251 KB, 1272x1048)
251 KB
251 KB PNG
https://reddit.com/r/LocalLLaMA/comments/1e6bceq/new_geminitest_in_chatbot_arena_is_good/
Looks like google is finally catching up to the big guns, was about fucking time
>>
>>101455186
>it's going to be 32k max with huge degradation after that.
You tested it?
>>
File: ezgif-4-2052ba3604.gif (2.79 MB, 480x270)
2.79 MB
2.79 MB GIF
Mistral Neko ~
>>
>>101455173
>Gemma27B still better
I don't get it. I keep reading all the posts hyping it and when I tried it, it felt like a 7-8B. And it was extra hard to set it to anything that doesn't make it spout schizo nonsense. What do you guys do to run it properly and think it is good?
>>
>>101455210
>There's so many options now aside from Llama.
There really aren't. It is all pretty much the same with some minor incremental upgrades here and there.
>>
File: 1705962944691597.jpg (100 KB, 500x710)
100 KB
100 KB JPG
https://www.axios.com/2024/07/17/meta-future-multimodal-ai-models-eu
Meta will NOT release their multimodal model in the EU as they fear the regulations.
The beginning of the end for open LLMs.
>>
OK. So there's nothing releasing today after all from anyone else. Fine. But perhaps that means they're confident that they'll be able to compete with the news of Llama 3 next week. That means it's going to be very good. We're going to be so back in just 5 days!
>>
>>101455270
this, 100% this
>>
>>101455273
How they gonna enforce it? lol this is internet the EUSSR is retarted it hurts.
>>
>>101455273
Europeans can just download it from a mirror or quanters. Is the EU government retarded?
>>
>>101455273
it is ok. bartowski will release it for them.
>>
>>101455292
>Is the EU government retarded?
is water wet?
>>
>>101455292
>Is the EU government retarded?
They don't care about individuals, they care about companies and shit. And companies fear the laws.
>>
File: scoop.png (234 KB, 1161x2866)
234 KB
234 KB PNG
send help I can't stop making degen shit
Qwen2-72B-Instruct-Q5_K_M
>>
>>101455208
gguf vs exl2 anon here. I used llama.cpp and tabbyapi latest versions.
Ooba uses llama.cpp python wrapper and it's not the latest version.
Same with exllama.

Booba is convenient but not always up to date
>>
>>101455105
Every model got obsoleted by Gemma 2. You have to surpass it to have an excuse to release something.
>>
>>101455305
>They don't care about individuals
They care about us, they want us to have as little power as possible, and users having a powerful LLM scares them
>>
>>101455292
>Is the EU government retarded?
Is this a real question?
>>
>>101455316
>They care about us
They don't. No one cares about local AI coomers.
>>
>>101455273
Man Europe is fucking retarded
Doesn't help that our current government sucks their dick clean
>>
>>101455320
Based and it should stay that way. Coomers are ungrateful scum.
>>
>>101455323
On aurait du voter RN aux éléctions législatives putain...
>>
>>101455337
You never coomed in your life anon?
>>
>>101455341
Im not French mon ami
>>
>>101455357
oh my b kek
>>
there's no way the new cope is that 27b > cr+ and qwen 72b/magnum
>>
>>101455365
better than Opus
>>
27B outpaces GPT-5.
>>
File: figures_arena1.jpg (487 KB, 1500x1115)
487 KB
487 KB JPG
>>101455365
It objectively trades blows with DeepSeeks's 236B model, and already surpassed Nemotron 340B and Llama 3 70b.
>>
File: 00012-1677813217.png (1.19 MB, 1024x1024)
1.19 MB
1.19 MB PNG
>>101455365
>VRAMlet cope... VRAMlet cope never changes
>>
>>101455355
he cut his dick off so he can't anymore
>>
>mikufag still seething that his wizard and midnight miqu scams fell apart
>>
>>101455397
>tranime avatarfag
slit your wrists.
captcha : G0YT4
>>
>>101455194
nevermind. GPT4|o has all the huggingface docs.
>>
>>101455292
the EU didn't ask meta not to release it, this is meta's decision, probably because they don't want to get fined since they trained it on facebook posts without following the regulations
>>
>Unlike previous Mistral models, Mistral Nemo requires smaller temperatures. We recommend to use a temperature of 0.3.
This will confuse and enrage the samplerfags
>>
>>101455395
arena has been irrelevant for months now
>>
>>101455365
It is, everyone who's saying otherwise is either lying or on cope that their $3000 llm machine is useless
>>
>>101455365
Reminds me of the shitty 7b finetunes that were "better" than GPT 3.5
Absolute copium by vramlets
>>
>Here at MistralAI we realize your time is very valuable to you. Which is why we have included both the sharded and unsharded weights in the repo to effectively double the download time. Have a nice day :3
>>
what's with mistral spamming all these useless tiny models
where's the big shit
>>
File: screenshot.png (203 KB, 934x475)
203 KB
203 KB PNG
Anyone still using 70Bs is not being honest with themselves.
>>
>>101455395
it's an API model though?
>>
>>101455236
A good model would say to just give the whole chain to the landlord as collateral and pay the rent in cash once you have it.
>>
>>101455492
They're going to release it API-only :)
>>
>>101455492
it's always that way anon, they experiment on little turd models and once they find the good formula, they give us that turd and process to train a giant model they'll keep for themselves
>>
>>101455495
https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat-0628
>>
>>101455461
>still coping about midnight miqu
You forgot your avatar, mikufag.
>>
>>101455490
Just don't download the whole repo
>>
>>101455471
t.too rich to try a 27b model
>>
i've noticed a whole new ism with Gemma27b.

"you think this is a game?" is something that pops up for me every single time there's even the SLIGHTEST conflict. i've seen it dozens of times by now.
>>
File: dge.jpg (379 KB, 3371x1715)
379 KB
379 KB JPG
>>101455522
that's weird, it is said it's a propriety model on chatbot arena
>>
>>101455558
it was, they just open sourced it today
>>
>>101455548
I tried it
It's redditsmart
But it has few parameters so it loses track of things in an RP and isn't able to follow logical conclusions as well as 70Bs. Im not even that rich (36GB) but even G2 27B at Q8 is worse than L3 70B finetune at 3.5bpw
>>
>>101455402
kek'ed
>>
>>101455551
It's just you.
>>
>>101455402
projecting your own desires huh?
>>
how many years until things get good?
>>
>>101449996
NTA but in my experience quant just kills the instruct capabilities without actually dumbing down the model too much

I've played around with Mistral0.2 2bit quant where I managed to get it to have similar responses for my purposes as the original model by using the completion style like

Do X using Y below
(content)
Certainly! here is X using Y:

The completion works fine, but if you don't do that last step of starting the model's answer, it degenerates
>>
>>101455622
People like you are morons. Look a two year back and where are we now.. things did get already good.
>>
>>101455273
literally no one gives a shit about europoors.
they're poor for a reason.
>>
I'm looking for something that would take a given text file, and for each paragraph estimate its clarity, and suggest a way to improve it. Then it aggregates all result in a file by ascending clarity score. I wrote a script to do it with ollama, but after running for 25 minutes it crashed because one of the replies was missing the clarity key in the json answer.

So to actually do that, it is necessary to run the query multiple times if the json is not valid. Also to have something of a hash of the query + paragraph to store intermediary results. A progress bar would be nice.

Is there already something to do this?
>>
>>101455663
these models still sound like robots. good is when they don't.
>>
>>101455737
look into grammars, I know you can use them to enforce an output schema with llama.cpp, not sure about ollameme
>>
>>101455365
It's shills, bored schizos and riddlers who talk about it nonstop. Everyone already tried it and found it to be garbage.
>>
>>101455751
They just need to be finetuned on 4chan.
>>
>>101455769
pretty much yeah but I wouldn't say it's garbage, it's probably the best thing in its weight class, it's just not better than 70b+ models
>>
yep. It's another episode of nothing fucking works.
>>
>>101455663
yes, we have same censored slop, nothing changed.
>>
>>101455798
>he doesn't know
>>
If I put two gemmas into mistralrs' anyMoE, will I get a better model, or a retarded fatter Gemma?
>>
>>101455764
This part works pretty reliably, even if it did fail once. It's more about defining a pipeline of tasks, one task that should run once all intermediary jobs are run, and handles storing intermediary results to be able to take up where it failed.
>>
>>101455857
write a python script
>>
>>101454907
>12B
oh my god who the hell cares?
>>
Gemma fucking sucks
>>
>>101455570
This kills the VRAMlet
>>
>>101455888
moi
decent quant will happily sit on my 3060
>>
Why you guys even post on /lmg/ when you hate everything? Take pause.
>>
>>101455903
Why do you post on /lmg/ when you hate everyone since they post here while hating everything? Take pause.
>>
>>101455951
>"heh that'll show him" ahh reply
>>
Looking at benchmarks is bad for mental.
Just try the model, see that Gemma is fast but pretty retarded and move on.
I don't care if it knows how many siblings Sally has it's stupid.
Qwen2 72b, L3 70B, CR 35B and basically nothing else. Of course fine-tunes of Q2 and L3 are great, CR doesn't even need a fine-tune. I'd recommend dawnbreak or daybreak L3 or however it's called. Banger of a model.
>>
>>101454907
>benches
Worthless until I can actually get good responses.
>12B
Lmao, the medium weight class is fucking dead.
>>
so like...anyone actually get mistral-nemo working yet? I keep getting tensor shape error.
>>
>>101455903
Mikufag became jaded and it's shitting on every model that does well on benchmarks and the arena.
>>
File: 1690423585049383.jpg (105 KB, 908x1280)
105 KB
105 KB JPG
another nail in foss ai meme coffin
>>
>>101455307
ooba has the latest exl2 version retard
>>
>>101455983
I have tried all of these models and settled on gemma-2-27b still being the best. CR 35B is not usable, by the way.
>>
>>101456078
Sad that google has fallen so far that they spam generals on 4chan with fiver jeets to try and look relevant.
>>
>>101456068
You are correct. It does run slower in my case though
>>
>>101456062
por que?
what benefit is there to having a portable, cloud-based LLM?
>>
>>101456134
Keep crying, miku.
>>
>>101456137
Exllama only has pipeline parallelism, vLLM is probably faster for multiple GPUs.
>>
Pls no larp.
Gemma has one benefit and that's speed. It's like saying 8B is faster, well yeah it is but it's also dumb. CR is the lowest I'd ever go for RP smut time. Otherwise 70B Q5 as the daily driver.
I've tried various miqus, mixtrals, gemma2, various l3s, CR, CR+, Qwn1/2, abliterated models and all kinds of other junk. If you really think G2 is good you may be retarded.
Just call it how it is, vramlet.
>>
File: 1714883181805436.png (3 KB, 368x53)
3 KB
3 KB PNG
>>101456195
>obsessed
>>
>>101456195
>cr 35b
>better than gemma
obvious shitpost
>>
>>101456180
can you explain the difference on why vllm would be faster?
vLLM also doesn't support as many sizes of quants
>>
>>101456062
OpenAI is dying, I'm only using claude 3.5 sonnet now, it's the only model actually good at code, there was gpt4 march 2023 that was also actually good back then but we can't use it anymore so...
>>
>>101456223
>>101456226
Disprove with logs lads, I've posted my qwenny logs in this thread and last.
>>
>>101456062
mini version will be free btw
>>
>>101456267
Not local not interested
>>
>>101456267
Free if your information is worthless.
>>
I actually can't run any of this so I just collect cards and wait for the day.
>>
>>101456278
>not shit enough not interested
kek
>>
>>101456236
Because it has tensor parallelism? It doesn't run the GPUs sequentially, or something like that. It also uses the NCCL library makes better use of NVLink.
>>
>>101456298
Any of it?
You could use a colab instance to run 8b at least, I'm pretty sure.
I think koboldcpp has a ready made colab notebook in their repo.
>>
I have a 3090 and a 850w power supply and just using a ryzen

what gpu can i plop in my 2nd slot for cheap that can fit 850w??? I just want more memory.....................
>>
>>101456343
>what gpu can i plop
you must be 18 or older and not have used reddit within the past 6 months to post here.
>>
>>101456343
a 3090
>>
>>101456078
CR 35B is bad at instructions but best for prose
>>
File: 1718052286137336.png (18 KB, 349x148)
18 KB
18 KB PNG
>>101456353
dont i need a beefier psu?

also i just realized my 2nd slot is just x2 lanes which sucks ass
>>
>>101456365
It's not usable because it does whatever it wants, usually just porn.
>>
>>101456375
10 strings a second don't need as much bandwidth as millions of vertex calculations 165+ times a second
>>
>>101456261
qwenny... uooh...
>>
>>101456393
hmm but i should probably just get a new motherboard that can be configured for x8 each since 3090s can get as cheap as $480 here nowadays
>>
>>101456393

NTA but what does NVlink do in this case if the 2nd gpu runs on x2 or x4?
>>
>>101456341
I like my privacy. But yeah I ran one of the weaker models and it still took 3 minutes to generate nonsense.
>>
>>101456469
I think both would run on the lowest bandwidth
that's how SLI used to work
>>
>>101456469
It explodes.
>>
>>101456436
Are you in Taiwan or something?
>>
>>101456499
das stupid
can't it just use the x16 lanes
>>
>>101456478
>3 minutes
I can run gemma-2-9b Q_4_M at ~5 t/s on my laptop CPU with llama.cpp
>>
>>101456062
Llama 8B fags BTFO
>>
>>101456563
*Q4_K_M
>>
>>101456563
I don't know. I just ran a random model I found on hugginface with koboldcpp.
>>
>>101456617
Did you offload all layers?
>>
>>101456573
not local, it will be censored to hell I won't be able to RP with my waifu, unironically :(
>>
>>101456062
Can't wait to try my character cards with it.
>Tags: loli, bestiality, double penetration
>>
>>101456062
They really hit the wall didn't they? They are just throwing shit out there to stay relevant after their failure with training GPT-5.
>>
27B SPPO when?
>>
>>101456703
I didn't fuck around with any settings because I didn't know what they do.
>>
>>101456821
this, that's the only thing I'm waiting at the moment
>>
>>101456952
Learn to prompt, retard.
>>
>>101456859
Fair enough.
In the case of these layers, you want to have as many as you can inside your GPU's VRAM, so that's something you should change if you can.
>>
File: 1709980355581910.jpg (79 KB, 1280x647)
79 KB
79 KB JPG
>>101456062
>>
>>101456062
When they fuck are they gonna release gpt5? They still act as if they're still the king of the AI world, it's not the case anymore, claude 3.5 sonnat is now the big dog, they will die if they don't step up their game
>>
>>101457145
>they will die if they...
release a bad model as gpt5, that would signal they can't innovate, that's why they won't, yet.
>>
>>101457145
GPT-4o was supposed to be GPT-5. It was so disappointing that they had to rebrand it under the GPT-4 moniker. "GPT-5" will only be released whenever it doesn't disappoint. Which will be never, so instead they will make up something about it being safer to do incremental updates from here on out and drop the GPT-x paradigm or some bullshit like that.

If they released GPT-4o as GPT-5 like originally planned the entire LLM industry would collapse and start a new AI-winter.
>>
Man. Chill. GPT-5 is in training. They'll BTFO everyone as soon as it comes out. The current competition is good. They'll be fine.
>>
how do I RP? I have really have zero experience in doing RP and I feel like im missing out a lot...
>>
mistral-nemo-8x12b-SPPO-orthogonal when?
>>
>>101457233
Orthogonal is a meme. You don't want the model to be unable to refuse completely when the story calls for it.
>>
>>101457145
after the election
>>
>>101457214
is this as good as LLMs will ever get?
multi-modal and other frankenstein hybrids in the works would seem to suggest as much, unless some new factors are introduced
>>
>>101457145
>claude 3.5 sonnat is now the big dog,
I also tried this model and I thought that the claude models were way more cucked than the chatgpt series. I was pleasently surprised when I realized it was the opposite, I'm not a murican so I tried to understand why Crooks registered as a republican even though he's a democrat. Bing chat told me to fuck off but Claude 3.5 Sonnet was willing to explain to me why (it was because he wanted to vote another republican than trump to weaken this side), that's when I realized that OpenAI is fucked if they don't react to that. Claude is less censored and better than OpenAI at the moment.
>>
File: F1OeOAzWYAEzmfM.png (129 KB, 723x666)
129 KB
129 KB PNG
>>101456758
>>
>>101456758
good thing it will reject your utterly shit tastes.
>>
Nemo GGUF https://huggingface.co/second-state/Mistral-Nemo-Instruct-2407-GGUF
>>
>>101457320
What? I thought people were reporting it didn't work.
>>
>>101457306
Ah, a man of culture, I see.

>>101457318
Nah man, I don't do scat.
>>
>>101457352
got a tokenizer issue when i tried to convert yeah, don't know how they did it, so might be scuffed
>>
>>101457355
pedoshit and bestiality is not that far from scat though.
>>
>>101457276
>is this as good as LLMs will ever get?
A new architecture will BFTO transformers and a 7b model will be as good as gpt4o, that's really likely, there's no way transformers is the dead end of machine learning, no way
>>
>>101457214
>GPT-4o was supposed to be GPT-5. It was so disappointing that they had to rebrand it under the GPT-4 moniker.
I also believe that aswell, they had no reason to just strive for a "slightly better version of gpt4", no one care about that stuff, and OpenAI used to go big everytime, that was unusual of them, maybe they reached their celling, but ClaudeAI definitely hasn't yet
>>
>>101457320
>LlamaEdge, powered by Rust and WasmEdge, provides a strong alternative to Python in AI inference.
great, another ollama
>The WASI-NN ggml plugin embedded llama.cpp as its backend.
>>
>>101457320
Does not work with kobold. Anyway someone on plebit report that the model It's coherent in novel continuation at 128K
>>
>>101457276
>is this as good as LLMs will ever get?
define "good". are we going to get big leaps in "intelligence" in pure text gen? probably not. seems we've pretty much hit the limits of what simple scaling can provide.

but there are lots of ways they can get better. there's dozens, hundreds of papers exploring ideas that would lower costs, increase control, etc. the engineering and compute can't keep pace with the research.

i think we've only scratched the surface of multi modality. it WILL result in increased intelligence and usefulness. then there's combining LLMs with other algorithms to better approximate cognition.
>>
>>101457276
>is this as good as LLMs will ever get?
If they keep pretraining their models will leddit and wokipedia, yeah, that's how far we can go. Especially for leddit, this place is hell on earth
>>
>>101457276
Until some breakout yeach that what we getting 2-5% better performance with each new model. Safety is for fags and the space is sadly full of them.
>>
>>101457504
>>101457504
>>101457504
>>
>>101457416
Stop trying to make your scat fetish happen.
It's not gonna happen.
>>
>>101457534
>pedo projections
seems you love scat in the end of the day huh?
>>
>>101457495
>Safety is for fags and the space is sadly full of them.
That's why it was never my dream to work on a giant company like google, you have to sell your sovl and your morals to work in such a cucked environnement
>>
>>101457549
>Arguing over whose fetish is worst
Shut up faggots
>>
>>101452686
There are few things that make me rage harder on 4chan, than /poltards who say "you're not welcome here," as if they seriously expect the person they're talking to, to give a flying fuck.

Get off the Internet, Cleetus, and go back to fucking your pig out in the barn.
>>
>>101457443
I mean, if you're trying to minimize effort what else are you going to copy?
Though the main dev doesn't seem to have any FAGMAN connections so this one is going to fail.
>>
>>101457253
I mean even if's not orthogonal, I'm assuming that an MoE 8x-12b will be around the same performance as early GPT-4 versions (censored)
>>
>>101456758
Do bestiality cards count if I'm playing the role of the dog?
>>
>>101457966
Just in case you don't already know them:
https://www.dlsite.com/maniax/work/=/product_id/RJ202234.html
https://www.dlsite.com/maniax/work/=/product_id/RJ182625.html
>>
hey /g/ - new to llms, just setup ollama and openwebui. it's working ok but I noticed my GPU isn't being used at all. I have an Nvidia 1080ti with 12G of VRAM using dolphin-mixtral:8x7b. nvidia-cuda-toolkit-12.5.0 is installed. how do I get ollama to use my gpu?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.