[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103045507 & >>103038380

►News
>(10/31) QTIP: Quantization with Trellises and Incoherence Processing: https://github.com/Cornell-RelaxML/qtip
>(10/31) Fish Agent V0.1 3B: Voice-to-Voice and TTS model: https://hf.co/fishaudio/fish-agent-v0.1-3b
>(10/31) Transluce open-sources AI investigation toolkit: https://github.com/TransluceAI/observatory
>(10/30) TokenFormer models with fully attention-based architecture: https://hf.co/Haiyang-W/TokenFormer-1-5B
>(10/30) MaskGCT: Zero-Shot TTS with Masked Generative Codec Transformer: https://hf.co/amphion/MaskGCT

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: 1707500550725825.jpg (134 KB, 1024x1024)
134 KB
134 KB JPG
►Recent Highlights from the Previous Thread: >>103045507

--Cost-effective GPU options for a club inference rig:
>103052715 >103052772 >103052916 >103053194 >103053478 >103053504
--Transluce open-sources AI investigation toolkit:
>103047524 >103047575
--Techniques for generating comic panels using AI models:
>103051465 >103051506 >103051561
--Sovits quality issues and potential solutions:
>103052143 >103053933 >103054902 >103054946 >103054976 >103054990 >103055007 >103055039 >103055112 >103055123
--SmolLM2-1.7B-Instruct-GGUF performance discussion:
>103046664 >103046725 >103046877
--Recommended settings for EVA-Qwen-72b:
>103053678 >103053685 >103053853 >103053891
--Oasis model discussion and comparisons:
>103050901 >103051033 >103051044 >103051367 >103051464 >103052003 >103053693 >103055898 >103055913
--New QTIP quant method for Llama models:
>103053148 >103053432 >103053501 >103053622 >103053536 >103053578 >103053647 >103053603 >103053813
--KoboldAI Lite updates and troubleshooting discussion:
>103053393 >103053409 >103053496 >103053919 >103054012 >103054064 >103054133 >103054374 >103054027
--Hardware and quantization considerations for running AI models:
>103046482 >103046516 >103046693 >103046832 >103046668 >103046613 >103050826
--First tests of Ezo, an AI that speaks Japanese, show promise:
>103054961
--Discussion of model recommendations and VRAM requirements:
>103045564 >103056095 >103056118 >103046718 >103049829 >103050128 >103052315 >103050874 >103053655 >103053699
--Anon shares a high-quality Japanese voice actor model:
>103056589 >103056606 >103056644 >103056737
--Advancements in embodied AI and the potential for synthetic beings:
>103054157 >103054218 >103054325
--Miku (free space):
>103045564 >103046333 >103047988 >103049417 >103051299 >103054122 >103055225 >103055337 >103056077 >103056425 >103056693

►Recent Highlight Posts from the Previous Thread: >>103045519

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
File: 023a3def6f9.jpg (465 KB, 1024x1024)
465 KB
465 KB JPG
--- A Measure of the Current Meta ---
> a suggestion of what to try from (You)

96GB VRAM
Qwen/Qwen2.5-72B-Instruct-Q8_0.gguf (aka the best of the best)
anthracite-org/magnum-v4-72b-gguf-Q8_0.gguf

64GB VRAM
Qwen/Qwen2.5-72B-Instruct-Q5_K_M.gguf
anthracite-org/magnum-v4-72b-gguf-Q5_K_M.gguf

48GB VRAM
Qwen/Qwen2.5-72B-Instruct-IQ4_XS.gguf
anthracite-org/magnum-v4-72b-gguf-IQ4_XS.gguf

24GB VRAM
Qwen/Qwen2.5-32B-Instruct-Q4_K_M.gguf
EVA-UNIT-01/EVA-Qwen2.5-32B-v0.1-Q4_K_M.gguf

16GB VRAM
Qwen/Qwen2.5-14B-Instruct-Q6_K.gguf
EVA-UNIT-01/EVA-Qwen2.5-14B-v0.1-Q6_K.gguf

12GB VRAM
Qwen/Qwen2.5-14B-Instruct-Q4_K_M.gguf
EVA-UNIT-01/EVA-Qwen2.5-14B-v0.1-Q4_K_M.gguf

8GB VRAM
mistralai/Mistral-Nemo-Instruct-2407-IQ4_XS.gguf
anthracite-org/magnum-v4-12b-IQ4_XS.gguf
TheDrummer/Rocinante-12B-v1.1-IQ4_XS.gguf

Potato
>>>/g/aicg

> fite me
>>
File: file.png (67 KB, 1630x370)
67 KB
67 KB PNG
You think gemini 2.0 gonna beat OpenAI? They are already very close
>>
>>103057373
yeah ok dude you could have just told me to fuck off when i asked and not be a dick about it.
>>
>>103057388
No lol it's a meme leaderboard
>>
>>103057417
>meme
as opposed to the non meme leaderboard as an because?
>>
File: 1711690590289518.jpg (93 KB, 800x600)
93 KB
93 KB JPG
>>103057388
No.
>>
>>103057422
Because if you use gemini 1.5 pro you would know how retarded it is.
>>
>>103057388
i'm unironically expecting an exponentially significant breakthrough soon but not from google lmao
>>
>>103057427
That was not my question tho, i wanted a non meme benchmark or leaderboard
>>
>>103057441
Isnt that suppsed to be the "live" thing that everyone is implementing now? Like chatgptsearch, perplexity, and now google has introduced google grounded into the ai studio I think?

Did anyone try those? Is it useful for work or research or just a meme?
>>
>>103057373
thank you babe
>>
File: ceba623027e1.jpg (196 KB, 622x673)
196 KB
196 KB JPG
--- A Measure of the Current Meta --
> a suggestion of what to try from last thread

96GB VRAM
mistralai/Mistral-Large-Instruct-2407 (aka Largestral)
mradermacher/Luminum-v0.1-123B-GGUF

64GB VRAM
bartowski/Llama-3.1-Nemotron-70B-Instruct-HF-GGUF/Llama-3.1-Nemotron-70B-Instruct-HF-Q5_K_L

48GB VRAM
bartowski/Llama-3.1-Nemotron-70B-Instruct-HF-GGUF
bartowski/EVA-Qwen2.5-72B-v0.0-GGUF

24GB VRAM
bartowski/c4ai-command-r-v01-GGUF/c4ai-command-r-v01-Q4_K_M.gguf
bartowski/gemma-2-27b-it-GGUF/gemma-2-27b-it-Q5_K_L.gguf
TheDrummer/Gemmasutra-Pro-27B-v1-GGUF

16GB VRAM
bartowski/Mistral-Small-22B-ArliAI-RPMax-v1.1-GGUF/Mistral-Small-22B-ArliAI-RPMax-v1.1-Q4_K_L.gguf

12GB VRAM
TheDrummer/UnslopNemo-12B-v3-GGUF/Rocinante-12B-v2g-Q5_K_M.gguf

8GB VRAM @ 30 GPU Layers (75% GPU offload)
Lewdiculous/MN-12B-Lyra-v4-GGUF-IQ-Imatrix/MN-12B-Lyra-v4-Q4_K_M-imat.gguf
mradermacher/Arcanum-12b-i1-GGUF/Arcanum-12b.i1-Q4_K_M.gguf

Potato
>>>/g/aicg
> or toppy 7b

Use:
koboldcpp
LM Studio
oobabooga/text-generation-webui
>>
>>103057388
So do you guys show your ass to the gay bear or?
>>
File: file.png (96 KB, 1465x892)
96 KB
96 KB PNG
Ecker's doing it again: falling for yet another meme for his TTS model...
>>
>>103057373
at least create your own miku next time copy poster
>>
>>103057528
>using Bing to create Mikus in the local thread
>unironically shilling Lyra, the one with the anti-merge license
I think you're the one without shame, Mikufag.
>>
>>103057373
>nothing for 40xH100 anons
ngmi
>>
Imagine if we combine all algorithmic advances, quantization methods, inference optimizations together by like 2035 and it means you can run AGI on a raspberry pi 3b 4GB.

Wouldn't it be very depressing to realize your old shitty laptop from 2010 that you used to play TF2 and L4D2 on had enough processing power and RAM to emulate a human level intelligence?

I sometimes also wonder about a timeline where the soviet union never collapsed and instead they adopted the internet in the 1990s and focused on computer technology as a new "space race". How would the current AI race look and what would the implications on the world be.
>>
>>103057449
no i mean like either a huge jump in intelligence or a giant reduction in training/inference compute
>>
>>103057539
I'm not "shilling" anything, it was shown by an anon in the last thread that's what he used.

because i actually went through the thread and put in the effort.

Whereas people want to stop people trying to help because their selfish belief system that knowledge should be horded and put a price tag on it.

well fuck that.
>>
>>103057608
Oh
Really? I litterarly only read about it plateauing for like a year right now everywhere
>>
>>103057602
The Soviets actual had their own computers based on ternary. But they didn't have the population or money to last much longer than they did.
>>
File: file.png (172 KB, 693x767)
172 KB
172 KB PNG
>>103057367
AyyMD OLMo, babby's first model
https://www.amd.com/en/developer/resources/technical-articles/introducing-the-first-amd-1b-language-model.html
https://huggingface.co/amd/AMD-OLMo-1B
just dropping this here idk what anyone would do with this information
>>
>>103057602
With how things are, humans will reach cat IQ faster than AI will reach human intelligence.
>>
>>103057620
They also had water based analog computers that were more efficient than digital computers for integral calculus needed for the space race up until the mid-1970s. It's kinda insane that the US was trying to figure out how the soviets were ahead in computing in the early 1970s despite CIA knowing their electronics were behind. It was a big mystery and you had all kind of theories of Russians faking their shitty electronics to fool US spies. Only to find out after the soviet union collapsed they just had very weird water based analog computers.

https://en.wikipedia.org/wiki/Water_integrator
>>
>AMD Ryzen AI 9 HX 370 w/ Radeon 890M
Are there industrial embedded motherboard that has this cpu? It seems very fast for igpu and would be great if it can use 64gb ram
>>
File: file.png (48 KB, 486x737)
48 KB
48 KB PNG
>>103057637
WTF is that eos token
>>
>>103057690
wtf indeed
>>
File: 1730351457555265.png (103 KB, 765x384)
103 KB
103 KB PNG
>>103057690
>>
>>103057690
oh, I guess that's a mistake, the eos is token 0.
>>
>>103057690
>it's real
What in the tarnation
>>
>>103057690
the first ever GLM (Glowing Language Model)
>>
>>103057442
It doesn't exist because of the inherent nature of leaderboards. The second a leaderboard is introduced in an environment like this is the second everyone starts trying to game the benchmarks instead of making better models. So, every benchmark becomes a meme benchmark by default.
>>
>>103057637
Is the bartowski quant broken? Why would they release this?
>>
Are QTIP quants good for you?
>>
File: IMG_0898.jpg (324 KB, 1320x414)
324 KB
324 KB JPG
NuClaude is hell. I can’t live like this.
>>
>>103057900
You wanted the boring assistant thing?
>>
>>103057931
I don’t want to interact with something less autistic than me. It’s unsettling.
>>
>>103057900
I prefer NuClaude to the boring ChatGPT that only speaks in lists. However it seems they're trying to make ChatGPT "cool" too
>>
File: buy-a-fucking-ad-asshole.jpg (396 KB, 1664x2432)
396 KB
396 KB JPG
>>103057367
--- A Measure of the Current Meta --
> a suggestion of what to try from last thread

>196GB VRAM
Qwen/Qwen2.5-72B-Instruct BF16

>96GB VRAM
Qwen/Qwen2.5-72B-Instruct Q8_0

>64GB VRAM
Qwen/Qwen2.5-72B-Instruct Q5_K_M

>48GB VRAM
Qwen/Qwen2.5-72B-Instruct IQ4_XS

>24GB VRAM
Qwen/Qwen2.5-32B-Instruct Q4_K_M

>16GB VRAM
Qwen/Qwen2.5-14B-Instruct Q6_K

>12GB VRAM
Qwen/Qwen2.5-14B-Instruct Q4_K_M

>8GB VRAM
Qwen/Qwen2.5-7B-Instruct Q5_K_M

Potato
>>>/g/aicg
>or Qwen/Qwen2.5-0.5B IQ2_XXS

Use:
vLLM
llama.cpp
tabbyAPI
>>
https://github.com/lyogavin/airllm
>70B on 4GB GPU
wtf
>>
File: 1694655604976846.gif (1.24 MB, 480x366)
1.24 MB
1.24 MB GIF
>>103057983
>>
>>103057983
High quality post and nice Miku
>>
File: AMD.png (40 KB, 817x912)
40 KB
40 KB PNG
>>
>>103058013
It's like I'm back in the 2.7b pyg days.
>>
what model can I put in 32gb vram?
>>
>>103058032
The base seems to be GPTNeoX so it's very close to pyg
>>
>>103058013
I’m crying
>>
File: IMG_0904.jpg (386 KB, 1320x1469)
386 KB
386 KB JPG
>>103058013
It’s time
>>
>>103058072
yeah nobody actually gives a fuck now.

>>103057367
/lmg/ can go die in a fire along with all the fucking retards who give a shit about AI

What's going to happen when a fucking AI takes all your job and you die of starvation?

Or when it fucking crashes the economy so money isn't worth a shit any more?

yeah have a good laugh now while you still can, lets see if you survive the next layoff wave.
>>
>>103058285
Nah. It's gonna be fine.
>>
File: sensible-chuckle.gif (992 KB, 250x250)
992 KB
992 KB GIF
>>103058285
>>
thanks for whoever recommended sovits for tts, but it occasionally generate things liek this is it supposed to be doing that?
https://vocaroo.com/1oj8DMm2i1CK
>>
>>103058285
good. goyim should starve.
>>
>>103058285
What layoff? We all work in AIML research here
>>
>>103058285
>muh job
>muh layoffs
I have almost half a brain so I milked this thing for all it’s worth from the beginning and never have to have a job again
Sucks to suck
>>
>>103058285
Hi, Sao.
>>
>>103058285
/lmg/ is so dead
>>
>>103058368
Improve your audio reference, match the tone of your prompt, play with the samplers
>>
>>103058368
Is that any good?
I have had some success with xtts2 but sometimes it also just outputs garbage.
>>
>>103058451
Grifting homos like you are the reason everything is shit now
>>
>>103058484
but it's just saying random things in chinese
>>
>>103058285
Sorry man, unless humanity stagnates, if it ain't AI displacing people from their jobs, it'll be some other form of automation.
That's what we as a species have been doing all ever since the industrial revolution.
>>
>>103058503
You got the right weights?
>>
>>103058495
Nah I’m one of the good ones actually
>>
>>103058503
nta. Not sure if this is your problem. Once when i was testing it i forgot to set the input language in the top-right, next to the reference text, and only set it in the output. It would spell out the output text before actually speaking it. I suppose you set both language dropdowns, but still worth a check. Sharing a screenshot of your settings *of when the gen failed* would be useful.
>>
>>103058523
i got it to speak english but it just says this about the weights
https://vocaroo.com/1dUOlsSOYDX6
>>
>>103058602
Ttsfags are extra retarded, can't bother to spoonfeed you
>>
File: pic.png (2.25 MB, 1869x1346)
2.25 MB
2.25 MB PNG
>>103057367
fugg I love local AI
>>
File: icy4u.jpg (201 KB, 570x380)
201 KB
201 KB JPG
>>103058632
https://github.com/danielgatis/rembg
>>
is this the new imggen thread?
>>103057367
nice gen
>>
>>103057637
I guess there's no rest for nala tonight.
>>
>>103058716
it's 1B, automatically ain't worth shit
>>
>>103058723
only Nala is qualified to make that assessment.
>>
>>103057999
Why nobody talk about this is totally ignored.
>>
>>103058732
>mom. i said the thing the other boys were saying... see? see? i'm like them!
>>
>>103057427
I never used it but Gemma 2 9b and 27b are very good for their sizes so I don’t think they would fuck up gemini after making that, right?
>>
File: Olmo nala.png (99 KB, 942x387)
99 KB
99 KB PNG
Nala test for the SFT DPO version of Olmo...and yeah... (I'll try putting in the glowy bos/eos token (just using a generic ML template) but so far it's about what you'd expect from a 1B model... of the Llama-1 generation. It can't sort out who is who.
>>
>>103058800
somebody should draw this in ms paint
>>
>>103058800
That's like trying to test a toddler on college stuff
>>
103057983
Fuck this bait post, Largestral destroys Qwen2.5.
>>
>>103058368
>https://vocaroo.com/1oj8DMm2i1CK
You still have it set to chinese, don't you
>>
>>103058800
Yeah the response is about the same when using the amd hackjob version of ChatML as it is just using a proper ChatML template.
>>
>>103058602
>i got it to speak english but it just says this about the weights
You need insanely clean reference audio or it sounds like shit. Either re-record in a silent environment with a good mic or run it through some noise reduction software. Then make sure the "Text for reference audio" is a letter-perfect transcript of what is being said.
>>
>>103057999
>>103058732
Much slower than CPU + RAM. All it does it run one unquanted layer at a time.
>>
>>103057999
>>103058887
NTA but 405B on 8GB vram seems like a neat proof of concept from a technical standpoint if nothing else. And fun for the world's most patient enthusiast lmao
>>
>>103057484
The thread needs this post except in english.
>>
>>103058905
If you're patient enough you could already do it on swap.
>>
>>103058873
You're right that the reference is unclean.
>>
>>103057484
For potatoes just install LM studio and download mistral.

Idk why thats not in the pasta....
>>
>>103057999
Well its not like its not possible. Just extremely slow.
>>
>>103058285
The elites always need slaves to do slave work
>>
>>103057572
I believe some anon here has 40xh100
>>
>none of these lists recommending Storybreaker-Ministral
That's how I know these lists are fake.
>>
>>103058920
I imagine this doesn't render the computer unusable while it's thinking like the swap method would, though
>>
>>103058822
>sunk cost fallacy
It's time to let go.
>>
>>103058965
nah I'm not a richfag, I only have 36gb vram
Largestral is just a better model, even at low quants
>>
>>103058013
bizarrely nostalgic
is AMD that far behind
>>
>>103058940
The elites need more kids to rape, so they'll at least keep some people around.
>>
>>103058976
Nah, it's even worse for people with low VRAM because of how exponentially worse they get below 4 bits.
>>
>>103058976
What quant are you running? Same VRAM. Even at IQ2_M it feels really slow for me.
>>
>>103058989
poor people can also rape kids
>>
>>103059013
IQ3_XXS, it gets too dumb below that. No model can survive dropping below 3 in my experience
I get just under 2 tokens/sec with 65 layers on gpu and a 16 core 5950x
>>
Best Free LLM Proxy
https://api.pawan.krd/cosmosrp/v1
put a single space in the API key field
>>
File: MikuNotAgain.png (1016 KB, 1200x848)
1016 KB
1016 KB PNG
>>103059078
>cloud shite
Wrong thread
>>
Speaking of cloudshit, while playing around with hailuo after discovering that they finally added image-to-video as well as official release outside of China I've noticed when it's not in queue it's way faster than Luma or Kling... which leads me to believe it's not an exceptionally gargantuan model... so maybe we'll get good local videogen at some point.
>>
>>103059078
Sooner or later that shit is going to implode
>>
>>103059119
It's quite obviously being funded by big money. It's clearly not stolen.
Day 1 that the key proxy repo was put up on HF the community manager took it down due to violating the rules, and then it was back up shortly thereafter without a further peep about it. They clearly exist for research purposes with a lot of money behind them. My guess is people are providing human training data for further censorship.
>>
>>103058013
Cactus
>>
>>103059078
what's the model?
his github claims 3.5-turbo but that's been deprecated off the API now, so it has to be something else
>>
>>103059141
I didn’t say anything about it being stolen keys. Things like this glow like a supernova. Sooner or later it’s going to implode in a way worse way than “oh no he went to jail :(“. People are going to die. I will not elaborate.
>>
>>103059256
>People are going to die. I will not elaborate.
did you learn to be this dramatic from sam altman
>>
>>103059141
https://x.com/pawanosmant/status/1743661803110908119
>>
>>103059323
"If it's free, you're the product"
Who falls for this shit?04gng
>>
>>103059341
obviously but the deal is fair
>>
>>103059266
No it’s just a side effect of the medication
>>
>>103059323
oh so it's openai
not worth using for RP or cooming, even for free
>>
File: grift.png (135 KB, 612x549)
135 KB
135 KB PNG
>>103059323
I'm not entirely sure he understands the words he's using.
>>
>>103059323
>sponsorship
lol he’s calling the basic bitch new user credits a sponsorship
If someone sent them a strongly worded email they’d probably purge his account
>>
How's the ST situation? Nothingburger? Forked? Migrated to something else?
Been away for about a month or so, saw that the ST guys we're trying to get more corpo friendly by removing a bunch of stuff.
>>
>>103059388
>Nothingburger
yes
>>
File: 1723771660587148.png (14 KB, 694x632)
14 KB
14 KB PNG
>buy AM4 mobo
>put in spare 5600GT and 128GB of RAM
>run larger models using vulkan
How bad could it be?
>>
>>103059441
>5600GT
Memory Support: DDR4
Rated Speed: 3200 MT/s
Memory Bus: Dual-channel
ouch
>>
>>103059450
does it really matter if you run the model on ram/cpu?
>>
>>103059458
for inferencing it's not that big of a hit
for processing? thousands of times slower
>>
>>103059458
picrel would be an acceptable spec for pure cpu inference if paired with a 24gb+ GPU for context processing
>>
is there a good local fine tuned llm that bypasses AI detection for assignments? I have to write a pretty long thing about a topic I don't care about much, but if I use GPT they're gonna find out (professor uses gptzero)
>>
File: 1715062579295070.png (40 KB, 857x179)
40 KB
40 KB PNG
Can't even make my own bot wet, damn it's fucking OVER
>>
>>103059520
no
>>
File: IMG_20241102_154912.jpg (129 KB, 765x605)
129 KB
129 KB JPG
>>103059078
It's glowing like fucking sun. I joined their discord to ask about the legality of the model to the owner himself but this is the response
>>
Has anyone tried using LLMs to practice/learn Japanese? I'm still a beginner, and I'm wondering if I can use an LLM to practice writing and reading sentences in a grammatically correct, but not stilted manner.

Maybe it's just a matter of having a good character card that instructs it to avoid using Kanji or other higher-level things?

Any recommendations are appreciated.
>>
>>103059616
SIR DO NOT REDEEM DISCORD LOGS SAR
>>
>>103060042
Ezo 72b is perfectly fluent, and after using it extensively over the last couple of days I’m confident that it can be prompted to do what you want. You may want to combine it with SoVITS tts engine as well for listening practice
>>
>>103059616
What a shady fucker. Anyone who uses that deserves what they get.
>>
>>103057983
Thanks Xi
>>
>>103059078
This picture looks AI generated
>>
>>103058942
Only for the weekend because his boss said it's ok.
>>
>>103060226
I imagine there's a bushel of fingers behind the camera. Also, he has no ears. How does he keep his glasses on?
>>
>>103050760
>>103050827
Would you all mind sharing settings/system prompt? The amount of times I’ve seen “impish grin” is bothering me more than I’d like.

>>103050826
Which 70B? I just tried the new RPmax at 5bpw and was not very impressed at all.
>>
Can burgers finally stop pretending democracy is real?
>>
>>103060081
Thanks for the suggestion. Have you tried the 32B version? I wont be able to run the 72b on a 3090.
>>
>>103057983
>Only Qwen 2.5 matter
is this a meme or it's true? kek
>>
>>103060666
as far as the chinese government is concerned, yes
>>
>>103060666
no qwen is a meme
use largestral instead
>>
>>103060911
USE MY ANUS INSTEAD
>>
File: ChatBIT.png (515 KB, 751x1000)
515 KB
515 KB PNG
Local models have been WEAPONIZED

>Nov 1 (Reuters) - Top Chinese research institutions linked to the People's Liberation Army have used Meta's publicly available Llama model to develop an AI tool for potential military applications, according to three academic papers and analysts.
>In a June paper reviewed by Reuters, six Chinese researchers from three institutions, including two under the People's Liberation Army's (PLA) leading research body, the Academy of Military Science (AMS), detailed how they had used an early version of Meta's Llama as a base for what it calls "ChatBIT".
>The researchers used an earlier Llama 13B large language model (LLM) from Meta (META.O), opens new tab, incorporating their own parameters to construct a military-focused AI tool to gather and process intelligence, and offer accurate and reliable information for operational decision-making.
>ChatBIT was fine-tuned and "optimised for dialogue and question-answering tasks in the military field", the paper said. It was found to outperform some other AI models that were roughly 90% as capable as OpenAI's powerful ChatGPT-4. The researchers didn't elaborate on how they defined performance or specify whether the AI model had been put into service.
>>
File: lightyear.jpg (435 KB, 2048x2048)
435 KB
435 KB JPG
>>103060964
>Llama 13B
>accurate and reliable information for operational decision-making
>>
>>103057373
>anthracite-org/magnum-v4-72b-gguf-IQ4_XS.gguf
Better than miqu? I could finally retire it, it's been serving me for over a year
>>
>>103060964
>llama 13B
what kind of retard worked on this psyop?
>>
>>103058800
Holy fuck.
>>
>>103060964
>considering the information you have provided me with, I would suggest to launch nukes to stop the capitalist threat
>the next step should be to launch the nukes
>finally, after you have accessed the situation and asked for the US' consent, you should launch the nukes
>>
>>103058800
this is the worst I've seen a model perform on the Nala test
why can't AMD do anything right?
>>
File: 1714372708048886.jpg (33 KB, 600x632)
33 KB
33 KB JPG
>realized that ollama models are by default q4 quantized
ollama bros... I don't feel so good
>>
>>103061042
Doesn't matter. It's easy to use. It's the Linux to llama.cpp's GNU.
>>
>>103061042
It makes sense.
Less bandwidth cost for LLM tourists trying baby's first local.
Q4 isn't completely lobotomized and runs fast.
Also,
>Exam 1
This response is more than correct and should earn 11 points.
>>
>>103059518
mindbroken
>>
>>103061078
He forgot to describe
>>
>>103060964
This is a Chinese psyop. Its aim was to drive the U.S. to impose heavy regulations on the machine learning field, effectively crippling research capabilities.
>>
>>103061078
It's not even q4_K_M though if I remember correctly, it's q4_0.
>>
>>103061191
Obviously. The question is whether they'll fall for it, egged on by OpenAI hoping for a regulatory moat.
>>
>>103061202
If it works, it works.
>>
>>103061202
If you're using ollama you shouldn't have any standard so it's fine
>>
>>103061368
I mean, on their YouTube channel they literally say that you should always run the smallest quant possible since it's not worth waiting a few extra seconds.
>>
>>103061388
I started my first local model with ollama after watching one of the videos... and switched over to kobold after half a day.
>>
>>103061388
>ollama
>YouTube
are you just completely incapable of independent thought?
>>
>>103061158
There is no describing that which speaks for itself.
What, is this one of those classes where they want you to do
//Find out if list is empty.
//Get lengthOfList.
final int lengthOfList = list.size();
//Test if lengthOfList is that of an empty list.
if(lengthOfList==0){
//Print true if true.
println("true");
}else{
//Print false if false.
println("false");
}

instead of a one liner?

>>103061202
I think so, at least for older models back when I let Ollama be my baby's first for about a week and then went Kobold and never looked back, except I think I still use one Ollama model file because it hasn't been awful and what works works.
Well, really I tried Ooba before that but that was when 1B and 2B were SOTA instead of whatever weird thing brought those designations back this week.

>>103061406
For a normie, using a local LLM at all is "independent" of the main stream, who just say "hey siri google alexa, send food."
>>
>>103061042
You can download different quants in their website.
It is a bit hidden but it’s there.
>>
>>103061040
>why can't AMD do anything right?
Honestly while it's perplexing why they would bother releasing this in the current landscape- I kind of get where they are coming from.
They obviously wanted to start from scratch themselves, internally, so they started with the basics. Training a small model from scratch. If they made this thing... 2 years ago maybe... it would be alright. "Hey cool AMD did an AI"
But they should have kept it to themselves until they had something more like Nemo. And they could have cited this precursory model in a paper or something or made it available for shits and giggles. But yeah. Unless you're presenting new novel architecture there's no reason to even bother releasing something like this in 2024.9
>>
>>103061570
That was the moment when I realized that instead of going to their site I could go to HF and use what the cool kids on /lmg/ were using.
And that's how I became a cool kid, too.

>>103061584
AMD is Nvidia's cousin. They need only to make the token effort to create a technicality of there being competition in the market to keep the monopoly charges at bay. If anything the only AI they should be earnestly investing in is game performance related so they can become the back side of Nvidia who picks up all of the gamers who can't get Nvidia because AI cleaned out the top shelf of the market and are willing to take AMD's second-rate offerings because they're there.
>>
Will a 100B BiNet model actually fit in 24GB or will there be more fat that comes with it like kv cache or embedding magic or attention that makes it not quite fit?
>>
File: m4max.png (28 KB, 900x276)
28 KB
28 KB PNG
It's so bad out there for AI hobbyists right now it's crazy. You have to choose between a designer shitbox or a rig of heaters both of which have 400% markup
>>
>>103061584
>should have kept it to themselves
Because it ruined your day to see a nothing mentioned? You're only hearing about it because someone saw a random blog post and said "look they did a thing".
>>
>>103062045
When is it getting better?
>>
>>103062084
Did it work? are you a real woman now?
>>
>>103062087
Not anytime soon. But you're already familiar with llms, you're supposed to be making money off them to buy better hardware
>>
>>103062045
>1.4kw for the GPUs alone
I PL mine to 200W each and it still has about 75-80% of its stock performance for both training and inferencing.
Quad 3090 rig (4090 if you're a richfag, P40 if you're a poorfag and have the technical skills to make it work) built on a server platform is the standard for going big at home. Anything else is contrarian nonsense. Mac can't train and has absolutely no upgrade/expansion path. It's only an option if you're okay with living in the pod and eating the bugs.
>>
>>103058285
You're blaming societal issues on a bunch of mathematics.
>>
>>103060666
It's true.
>>
>>103058285
>What's going to happen when a fucking AI takes all your job and you die of starvation?
I know how to feed myself and live off the land.
>Or when it fucking crashes the economy so money isn't worth a shit any more?
The west is the great satan and I have been waiting all my life for it to collapse.
>yeah have a good laugh now while you still can, lets see if you survive the next layoff wave.
The people who cling to toil and materialism will have the worst time. Stop projecting your fear onto others. Not everyone is a pussy-ass bitch like you who would just lay down and starve when push came to shove.
>>
>>103058011
Quit samefagging nigger
>>103060666
Zoomer discovered gatekeeping, idk why no one ridicules him for this.
>>
>>103062087
I am currently assisting a company in designing RISC-V processors for parallel workloads (by weighing in on which features would be useful for machine learning applications).
Presumably there are other companies doing the same thing so
>two more years
>>
I still dont get how this shit works. I put in words then my gpu dose millions of calculations and makes more words coherently and now my dick is spraying semen out. It make's no sense.
>>
For other people that are using nemotron, are you having issues with it constantly trying to put things into bold and using weird formatting? I dunno why but it's constantly using brackets and double asterisks. Neither of those things are anywhere in my sysprompt or cards.
>>
>>103062371
damn, on one hand it's really cool, on the other... Seriously? there's no one who knows how to do this and they had to ask help from a physics student who has AI as a hobby?
>>
>>103062410
Not only are there not that many, those that are available are expensive as all fuck. The only thing more valuable than GPU clusters is experienced personnel.
>>
>>103062191
Given responses like >>103060666 I approve of his post.
>>
>>103060603
I haven't tried it. that 30b class tends to be cursed. I'd personally take the speed hit and run partial offload of a q5.
>>
>>103062395
Yes, Nemotron was tuned to 'think' things through using bullet points and lists so it'll always try to implement those in its replies.
>>
>>103062951
Isn't that what "stepped thinking" addon does in ST? Wonder if it works with Nemotron.
>>
>>103057367
the real picture is a classic
>>
>>103059493
I see
>>
Why are vision models so bad? Even top proprietary ones on lmsys arena can't OCR a paragraph or random words without making mistakes. They always hallucinate new ones or skip some.
>>
>>103061289
Anthropic and OpenAI are playing such a stupid game amping up the "risks" to 11 and asking every freaking time for "regulation".
The day this shit explodes on their face I will laugh, even if it impacts me.
>>
>>103063172
OCR was already a thing before AI and "better OCR" doesn't sound that interesting to investors
>>
>>103063172
>can't OCR a paragraph or random words
OCR is a solved problem with classical algorithms. Use the vision models for their novel abilities eg. inferring what people are doing in a scene or telling you about composition, etc.
>>
>>103063222
Yep, using the correct tool for the job instead of dumping everything on transformer models
>>
>>103062045
>M4 Max supports up to 128GB of fast unified memory
Would the M4 max actually be usable/worth it for local LLM?
>>
>>103063268
>Would the M4 max actually be usable/worth it for local LLM?
It remains to be seen if there is any dedicated hardware on the M4 that can compete with a gpu for context processing. The prices of the mac stuff is high enough to go another route that doesn't kneecap you every time the backend needs to process context for some reason.
If you could pick up an older mac studio with 192gb m2 ultra for a reasonable price, it might back a good RPC backend in conjunction with a regular pc gpu rig, but that's the only scenario where I think the hermetically sealed apple monoliths are worthwhile.
>>
>>103063328
>older mac studio with 192gb m2 ultra for a reasonable price
Yeah I've looked into this, prices are prohibitive for this stuff to this day sadly.
>>
>>103062126
>P40
It's not like that anymore. These days, it is hugely overpriced e-waste.
>>
how to use local models to "humanize" text to write my phd thesis on gender studies
>>
>>103062371
>>103062421
I think tenstorrent is doing 600 riscv cores with vector extensions, and it's basically a cheapo machine learning setup, you can buy one of their cards for something under 1000$ and over 500$, brand new, 24gb of vram, it's not worth it relative to a used 3090 as you'd also need to write all the software you need for it, but it's the future!
>>
>>103063372
>Yeah I've looked into this, prices are prohibitive for this stuff to this day sadly.
on ebay for sure. there are some that pop up on eg. facebook marketplace from time to time. There's a 128gb one near me for under $4k. Still not worth it imo, but it might be to someone else vs picking up a few more 48gb gpus if their box is already getting maxed out.
>>
>>103063389
Yeah it's unfortunate.
And it doesn't support bnb quantization which = no qlora
but if you absolutely have to have a GPU server for running LLMs and you're an utter poorfag it's basically what you get.
>>
>>103063403
My kids use our local AI rig to help with schoolwork, and telling it to essentially "Write like a 10th grader midwit" seems to get acceptable output the teacher doesn't catch.
>>
>>103063204
>>103063222
Lol what a total meme technology. Completely fucking useless for doing anything productive.
>>
>>103063477
>Already cheating in 10th grade
Grim
>>
>>103063499
Retard
>>
>>103063499
>a rope?
>I can't use a rope to cook my dinner
>total meme technology.
>>
>>103063510
Silence boomer. It's more importnat for them to learn how to use AI than to practice writing papers. How often does manual math without calculator come in handy for you?
>>
>>103063521
>>103063522
You're coping. Now I totally get why ggerganov doesn't want to add this stupid meme. Who even needs it? Why not just have an image recognition model separately, why is there need to couple it to llm?
>>
>>103063433
It's literally overpriced and no longer affordable.
>>
>>103063572
Because you can't read beyond one sentence, retard >>103063222
>>
>>103063577
What the fuck I just looked it up holy fuck.
It's more than half the cost of a fucking 3090 now.
>>
File: tenthgradecheater.png (222 KB, 884x791)
222 KB
222 KB PNG
>>103063510
its mostly good for engaging with the material outside of class (or pure busywork)
picrel
>>
>>103063551
>How often does manual math without calculator come in handy for you?
A lot, midwit. Would you not learn to walk and to talk if there was a machine to do it for yourself?
>>
>>103063596
>Why not just have an image recognition model separately, why is there need to couple it to llm?
>>
>>103063625
>Would you not learn to walk and to talk if there was a machine to do it for yourself?
A 5 minute stroll through any walmart filled with diabetics on mobility scooters will tell you that plenty of people do.
>>
>>103063611
I was curious so I checked the price of a used 3090, it's the exact same as when I bought my second one 2 years ago.
Kind of insane.
>>
>>103063791
Yeah they sort of edge up and down a bit but have stayed roughly the same...which given how old they are getting might as well be a price increase. But it's still at least somewhat reasonable for what you are getting.
>>
Companies who claim they're pro-open source should give me the hardware to run their models too
>>
cactus
>>
>>103062045
I find it crazy that we are almost in 2025, and nothing still beats buying a gaming GPU to do this hobby for most individual users.
>>
>>103063873
Yeah it's been zero fucks given for hobbyists on all hardware fronts, unfortunately. I know they don't want their enterprise/workstation market to go buying lower price consumer products..but then... just kneecap it on compute and give it lots of VRAM. Make like essentially a 3060 with 48GB VRAM. Problem solved. Too little compute to be worth a damn to any professional client. But gets a lot of hobbyists much deeper into the hobby for a reasonable price.
>>
>>103063873
That's what you get when you have one company producing GPUs. No, AMD and Intel don't count.
>>
>>103063896
>No, AMD and Intel don't count.
They absolutely do count in the sense that they've done literally nothing to compete for the hobbyist niche which Nvidia has left wide fucking open. They are anti-competitive cunts.
>>
>>103063873
the chinese fabs will solve this
>>
>>103063892
Surprised Nvidia doesn't go this direction, actually.
Selling a 48GB card so that people can start using CUDA as hobbyists as the next generation of people working on LLM will come from them.
The microsoft or even apple strategy to spread their OSes to students basically.
>>
>>103063896
>That's what you get when you have one company producing GPUs. No, AMD and Intel don't count.
and they don't count specifically because we are at the moment in time where you need the bleeding edge tech in order to do the neat thing at all. Shades of late 90s Intel. It will all be commoditized soon. Its too useful not to be.
>>
>>103063925
how so
>>
>>103063864
cactus
>>
what kind of t/s am i looking at serving nemo 12b on a 3060?
i want to serve it for a small project
>>
File: hotpotcar.png (2.55 MB, 2045x1369)
2.55 MB
2.55 MB PNG
>>103063941
in the same way the chinese market bears making weirdo EVs, it'll bear making weirdo GPUs (one the fabs are up and running)
>>
>>103063892
If they proceed with that design, how will they justify offering such low VRAM on their new graphics cards?
>>
>>103063955
30+
>>
>>103063988
They could have 3 options for 3 different markets:
>top of the line sota enterprise shit
>gaymin with decked out features and optimized for raytracing or whatever
>hobbyist stuff that's slightly faster than macbooks but with a ton of vram
>>
>>103064000
is that an estimate from bandwidth or real world performance?
if that's the actual performance that'd be very much sufficient
>>
>>103064014
Nvidia was always greedy with vram, so it is unlikely that they'll change their ways now. Remember 970 with 3.5GB that were advertised as 4?
>>
>>103063988
>>103064014
I imagine they'll start to make headless vram rich inference engines that aren't economical for training at scale vs their high-end stuff once the right part of the price/performance curve opens up. I don't think there's a way to do it right now without kneecapping their enterprise stuff.
>>
>>103064021
Real hardware, linux:
3060 Nemo Q5_K_S 33 t/s
3080 59 t/s
6800xt 32 t/s
>>
>>103064090
damn
thanks
>>
>>103063966
weird doesn't mean good. it just means they'll churn out a lot of shit GPUs that no one in their right mind would buy
>>
Looks like llama 4 might be IT. Pretty sure openai / anthropics big secret is just grokking, aka throwing a absolute ton of compute at it till it overfits then keeping training until it inexplicitly starts generalizing again. Llama 4 is apparently going to get 100x more compute than llama 3. So like 1500 Trillion tokens?
>>
Started using qwen2.5-14b-instruct instead of vntl lamma 8b and the quality is much better.
But sometimes it still gets some stuff wrong.
In this sentence, Ril is the one that sent the apology but it keeps getting it wrong. I've tried all kinds of prompts but it never get it right.

Can anyone suggest another model or promp?
>>
>>103063966
There will be no software support for chinkshit. They'll try to make the drivers compatible with CUDA for marketing purposes, but the drivers will be full bugs and if you email them about that you'll get replies in chinese runes
>>
>>103064185
>llama 4
That'll come out in Q1 2025.
>>
>>103064185
You can optimize your compression algorithm as much as you want. At the end of the day if you compress filtered midwit garbage then you'll get filtered midwit models.
>>
>>103064185
With such ample computing power, they're able to run various experiments, like, you know, BitNet maybe?
>>
>>103064198
that's already better than AMD then
>>
>>103064211
I used to think that until I explored just what stuff claude 3.5 sonnet knows. It has utter trash sites and stuff baked in like literotica / fanfiction.net and the like and it still is the best model. Training is what matters.
>>
>>103064198
If they have their own fabs, there's a non-zero chance that they could potentially smuggle the 4090 design. The Taiwanese are quite corrupt.
>>
>>103064021
I can confirm, but use exl2, avoid llama.cpp when you can fit the entire model in VRAM, the prompt processing is basically instantaneous.
>>
>>103064271
That's why Claude beats everything else by miles, because they train on everything for rare tokens. OpenAI and Meta will never get anything good if they keep training on ScaleAI garbage. Llama3 8B didn't know what a greentext was, the 70B version did but I suspect it drew from blog posts and knowyourmeme instead of 4chan, which means they filtered this domain out of their training data.
>>
>>103064188
Assuming you can't run 20B+, I guess I would suggest vntl llama3 8b, but the older one, in case you haven't tried it yet.
>>
>>103064358
I've tried this one and it was really bad
>>
>>103064438
Yeah, take a look at the one without 202409 in the name.
>>
>>103064349
Nah, 70B 3.1 knew some stuff better than 3.0 did showing its just a problem of not enough training imo. It needs to "overfit" more to better retain more trivial stuff.
>>
>>103064349
>That's why Claude beats everything else by miles
Alright, where is our local edition of Claude?
>>
>>103064504
Was it trained or distilled from 405b?
>>
>>103064504
Maybe. I tested 3.1 8b and it could make greentexts again.
>>
>>103064522
If you have ever tried training a model then you would know they seem to forget stuff they learned earlier then at a certain point it will reemerge often better / more accurate than it was before.
>>
>>103053148
Has anyone tried it? How well does it work?
https://huggingface.co/relaxml/Llama-3.1-405B-Instruct-QTIP-2Bit/discussions/1
>>
>>103064447
Seems to translate correctly, but refuses to do more than one line at once even in chat mode. QWEN only got it wrong when I had those two lines consecutive of one another so I couldn't get a proper test
>>
>>103064555
I don't know but nylon on feet looks good
>>
>>103064544
I think its basically the "neurons" changing and losing connections they once had and later reforming them in a different / better way / place
>>
>>103064555
Still uploading it looks like.
>>
>>103064557
Translating more than one line at a time is generally a bad idea unless you're using a big model iirc
>>
>>103064583
https://huggingface.co/relaxml/Llama-3.1-405B-Instruct-QTIP-2Bit/tree/c37474cce555fe60ded7da1ea254ef19da13bcd1
>>
>>103064555
how challenging would it be to generate an image like this
>>
>>103064220
They did say llama 4 was supposed to be not only better but faster? Not sure if that means just quantization aware training or what though.
>>
>>103064637
If flux wasn't retarded for this stuff it wouldn't be impossible
>>
>>103064637
There is Ass Stacking (Pony) lora for that
>>
>>103064637
Noobaixl, stack of butts or something tag on danbooru
>>
>>103064686
Most likely quantization aware + layer skip + something else, but they could at least try BitNet
>>
File: file.png (52 KB, 871x492)
52 KB
52 KB PNG
https://github.com/EikaMikiku/SillyVoice
Did a thing before voice-voice models take off.
Maybe someone will find it useful.
>>
File: 1730491728028790.gif (1.12 MB, 224x224)
1.12 MB
1.12 MB GIF
just having a small 3060 server to serve nemo with drawing 40 watts on idle would be 15€/month here in germany (28.8 kwh at 50 cents per kwh)
no wonder all the industry is leaving
>>
>>103064727
>GPU: 0% 38°C 6W VRAM: 9.9GB 62%
You can have half of that, 6800xt draws only 6-8W on idle
>>
>>103064687
Are there still no Flux NSFW loras?
>>
>>103064727
Holy shit the situation is way worse than I thought for energy price in Germany. How are people still pushing for more renewable instead of nuclear is beyond me.
I'd pay 7€ where I live.
>>
shilling for monstral at 5bpw. it beat's largestral, all the finetunes, and miqu. i don't know why people are saying it has issues following prompts, it seems to stay on track up to 32k.
>>
File: 1424215745865.png (31 KB, 716x302)
31 KB
31 KB PNG
>>103064757
just having my cpu on idle costs me 6€/month
>>
>>103064763
There are some but it's not amazing nor ground breaking.
Maybe it will change with the new sd3.5l and sd3.5m. I hope it does, pony was fine but I want natural language prompting to be a thing, like dalle did.
>>
>>103064819
Use this:
https://civitai.com/models/833294?modelVersionId=998979

And new pony is apparently training soon and will be natural language captions as well.
>>
i'm salivating of 1Bit Qtip 405b
>>
>>103064842
>The creator of this asset requires you to be logged in to download it
Why do we tolerate this kind of racism on the internet?
>>
>>103064842
>https://civitai.com/models/833294?modelVersionId=998979
There are so many sdxl finetunes lol. I want that for newer bigger models tbdesu.

>And new pony is apparently training soon and will be natural language captions as well.
Won't be based on sd3.5 nor flux afaik, so who knows what the result will be.
>>
>>103064881
This one is legit next level though. It just came out, its better than pony and novelai v3. Just try complicated stuff with it that pony could not do / needed a lora for.
>>
did anyone manage to snag any turin engineering samples when they hit ebay briefly?
>>
>>103064727
My electricity costs between 45 and 65 cents per kwh. My homelab averages 100w and costs me almost $30/month and it doesn't even have a gpu in it.
I'm in a major city in the US btw.
>>
>>103064894
>better than novelai v3
OK that peeks my interest, I never saw any sdxl model able to even touch it
>>
>>103064875
https://temp-mail.org/
>>
>>103064900
wtf I thought the US had the cheapest energy prices
is it California maybe?
>>
>>103064894
what are the recommended cfg/steps/sampler?
>>
Just the qtip
>>
>>103063919
There is no hobbyist niche ... a couple thousand people simply don't matter.
>>
>>103064698
>>103064696
>>103064687
we live in an age of hyper optimization
>>
>>103064916
New beta sampler in comfyui is nice. Not sure what it might be called in forge / reforge. Usual 4-7 cfg. Most samplers are 30-50 steps for decent results.
>>
>>103064842
why is civitai completely killing my chromium
>>
File: 1710004668627709.png (211 KB, 749x650)
211 KB
211 KB PNG
>>103063001
Indeed. It is worth paying such homage to.
[pretenditworkspoiler]Even though I never watched the film it came from.[/spoiler]
>>
Is MGS2 colonel voice avalable șomewhere? I cannot find model online.
>>
Is there no easy diffussion equivalent for TTS?

I saw that XTTS is pretty good for voice cloning, but i can't exactly figure out how to install it.
>>
>>103065158
Here's 2 options.
https://github.com/effusiveperiscope/GPT-SoVITS

https://github.com/fishaudio/fish-speech/blob/main/Start_Agent.md
>>
>>103065169
There's also https://huggingface.co/SWivid/F5-TTS
>>
File: 4bc8f6c0-8262.jpg (6 KB, 241x209)
6 KB
6 KB JPG
>>103065169
Thank you anon. God bless
>>
>>103064915
Yep California, gas is expensive as fuck here too.
>>
So is QTIP really better than IQ quants?
>>
>>103065206
then yeah makes sense
Cali if the most european like in its tax structure
>>
>>103065213
maybe
>>
>>103064959
>a couple thousand people
There's 8 billion people on the planet. 4 billion if you exclude people living in abject poverty. a 1% niche hobby would have 4 million participants.
Let's say 1% of people are computer enthusiasts and AI is a 10% niche within the computer enthusiast space. You're looking at hundreds of thousands of people. That's why P40s got memed up to half the price of a 3090 because there's a lot of people trying to get their hands on them. Thankfully most of them are too poor to go for 3090s otherwise that market would get wrecked too.
>>
>>103065276
oops I fucked up that math.
1% niche would be 40 million participants
So millions of people potentially competing for e-waste computer parts right now.
>>
>>103065283
I also heard rumors of Chinese companies hoarding 3090s (and 4090s but there are less of these used).
>>
File: 1730138368937693.png (1.37 MB, 1024x1024)
1.37 MB
1.37 MB PNG
>>103065276
The availability of affordable GPUs may lead to increased competition with cloud-based AI services, potentially reducing the demand for H100
>>
File: pepe-laugh.jpg (6 KB, 231x218)
6 KB
6 KB JPG
>>103058013
SOVL
>>
>>103064975
>New beta sampler in comfyui is nice
what's this called? I actually broke down and took the noodle pill to get a good flux workflow going.
>>
>>103065380
beta
>>
>>103065118
what movie?
>>
>>103065425
Three Days by Sharunas Bartas
>>
>>103065155
>MGS voice
just use any tts with a short audio clip. here a super low effort one made with one random 5 second MGS wav from a sound board and the pretrained model gpt-sovits ships with.
https://vocaroo.com/1nK4tXqqlCRO
>>
File: IronMiku.png (1.71 MB, 832x1216)
1.71 MB
1.71 MB PNG
>>103065065
because your computer is WEAK
(and because their webdev is shit and disrespectful of their users)
>>
>>103064555
I wanted to try 70B 2bit long time ago but of course it required some dependency I couldn't install. I suspect this is the same.
>>
>>103065633
https://github.com/Cornell-RelaxML/qtip
>>
Nemotron vs largestral?
Also, how is the now taken down wizardlm holding up?
>>
>>103065667
Go back to /r/localllama, shill.
>>
>>103065653
Cool story bro but just search hf for quip quants. You can see nobody used this shit probably because of what I said.
>>
>>103065716
sir you forgot to take your pills
>>
>>103065667
Get back to /r/eddit
>>
>>103065718
https://huggingface.co/collections/relaxml/qtip-quantized-models-66fa253ad3186746f4b62803
>>
>>103061671
>>103061671
>>103061671
New thread
>>
>>103065667
I personally hate nemotron because it likes to format everything as a list. Largestral doesn't have that problem. Wizard is outdated and dumb.
>>
>>103065948
It does not do that if you tell it to RP. Or if you use one of the finetunes. Or you can embrace it and tell it to plan its response first.
>>
>>103065891
oh right. newfags don't know that the baker is a mentally ill schizo.
>>
>>103065494
Thanks, looks great. Didn't test yet
>>
https://www.pcworld.com/article/2504035/security-flaws-found-in-all-nvidia-geforce-gpus-update-drivers-asap.html
Update your drivers. Don't mind the -15% in t/s throughput.
>>
any idea on where i can get a local model that does something similar to chatpdf? it essentially takes a pdf and produces a bunch of questions, summaries, and exercises based on the pdf, which tends to be an ebook.

are there any models on hf that can something like that or better?
>>
>>103066291
Go to reddit newfriend.
>>
>>103066291
Silly Tavern has a functionality called databank.
it's a little rough, but can be used for that, although you'd be better off turning the PDF into cleaned raw text files.
As for the models? Anything you can run with a large context. Stuff like nemo for example.
>>
>>103066306
>>103066330
ah so its a matter of front-end functionality, okay thank you. I've been using deepseek v2.5 as a programming tutor so I assume that should suffice, right?
>>
>>103066291
Holy retards. Imagine a normalfag hitting you with "Eh bro I just saw the last tesla video, is there a local robot that can do my dishes like that for free? Robotics? Nah I don't want to learn all that stuff, just give me the end product thanks."
>>
>>103066354
If you are running it with a large enough context window, sure.
Might want to look for dedicated RAG solution. I remember trying jan.ai one and it was okay I guess.
>>
>>103066358
It's not his fault local bros are not developing tools for their LLMs.
>>
What models are best for erotic roleplay? I can't find a half decent one.

LLaMa 3.x:
- Can follow structure of roleplay
- Can remain sensical an coherent
- Can NOT stray from "safe" programming

SmolLM2 1.7b:
- Can follow structure of roleplay
- Can NOT remain sensical an coherent
- Can stray from "safe" programming

Unholy v2 13b:
- Can NOT follow structure of roleplay
- Can remain sensical an coherent
- Can stray from "safe" programming
>>
>>103066402
You need a 70B model
>>
>>103066402
Mistral Small is the best small model.
>>
>>103066402
Try Behemoth-123B-v1.1
>>
>>103066402
>- Can NOT stray from "safe" programming
Prompt issue.
>>
>>103066358
got it, thanks
>>103066375
thank you, ill give this a try. i tried silly tavern a while ago and was completely filtered (especially the API key part). this site seems to dumb it down for retards like myself.
>>
>>103066415
I can't handle 1 tok/s
>>103066435
no it's not, faggot. there's no magic prompt that unfucks llama 3.x
>>
File: 1714976103618856.png (270 KB, 1717x1517)
270 KB
270 KB PNG
>>103066487
Explain this image.
>>
>>103066487
>I can't handle 1 tok/s
Try 30Bs. Not as good but the next best option
>>
>>103066494
looks pretty cringe, desu
>>
>>103066170
>privilege escalation
Nothingburger unless you rent out your hardware.
>>
Why does it do this
Fucking hella good translation, it just fucking forgets to keep translating
>>
>>103066620
(Translator's note: keikaku means plan)
>>
File: 1711668129691197.png (1.21 MB, 2160x2160)
1.21 MB
1.21 MB PNG
>>103066795
>>103066795
>>103066795
New thread
>>
>>103066620
Let me guess, that's Qwen
>>
>>103066871
yes, works great until it decides not to
>>
>>103066620
Use something that lets you see the token probabilities, it might have been a low probability token, and you can adjust min_p/top_k for that.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.