[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: 1724876676171451.jpg (813 KB, 1920x2480)
813 KB
813 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102130111 & >>102114085

►News
>(08/29) Qwen2-VL 2B & 7B image+video models released: https://qwenlm.github.io/blog/qwen2-vl/
>(08/27) CogVideoX-5B, diffusion transformer text-to-video model: https://hf.co/THUDM/CogVideoX-5b
>(08/22) Jamba 1.5: 52B & 398B MoE: https://hf.co/collections/ai21labs/jamba-15-66c44befa474a917fcf55251
>(08/20) Microsoft's Phi-3.5 released: mini+MoE+vision: https://hf.co/microsoft/Phi-3.5-MoE-instruct
>(08/16) MiniCPM-V-2.6 support merged: https://github.com/ggerganov/llama.cpp/pull/8967

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: threadrecap.png (1.48 MB, 1536x1536)
1.48 MB
1.48 MB PNG
►Recent Highlights from the Previous Thread: >>102130111

--Quantization effects on model accuracy and knowledge retention: >>102132002 >>102132446 >>102132496 >>102133471 >>102134028 >>102137139
--/lmg/ is dead, and anons discuss quantization in LLaMA-3: >>102130181 >>102131008 >>102130929 >>102131088 >>102131027 >>102131052 >>102131081 >>102130965 >>102131048 >>102131095 >>102131114 >>102131200 >>102131234 >>102131276 >>102131294 >>102131299
--Running large AI models on limited hardware, alternatives and workarounds discussed: >>102131778 >>102131942 >>102132025 >>102132181 >>102132333 >>102138543 >>102138984 >>102139800 >>102136953 >>102131860
--Magic Labs announces LTM-2-Mini with 100M token context window: >>102141258 >>102141443 >>102141696 >>102141870 >>102141784
--Llama-server vs koboldcpp discussion and FA performance issues: >>102143726 >>102143744 >>102143765 >>102143826 >>102143919 >>102143942 >>102143997 >>102144101
--KTO outperforms DPO but is much slower: >>102132550 >>102132712
--Engineers vs scientists: understanding AI is not necessary for practical applications: >>102138998 >>102139414 >>102139654 >>102140159 >>102141060 >>102141245
--Claude and Chatgtp accused of ripping text from video games and fanfiction sites: >>102133382 >>102133387 >>102133414
--Anon wants to modify CogVideoX-5B to accept images as input: >>102130233
--Anon seeks model for generating hentai scenes for VN, Layla app and ggufs models suggested: >>102142048 >>102142112 >>102142186 >>102142502 >>102142739 >>102142135 >>102142282
--Modded Chinese cards created to circumvent US export ban on Nvidia's high-end cards: >>102133029
--Llama sees 10x growth since 2023, leading AI innovation: >>102139603
--Anon shares models and settings for NSFW content generation: >>102143487 >>102143674 >>102143833 >>102143894 >>102144105 >>102144220 >>102144407 >>102144422
--Miku (free space): >>102135364 >>102142492 >>102144385

►Recent Highlight Posts from the Previous Thread: >>102130124
>>
first
>>
>>102145961
boring
>>
File: woman.png (65 KB, 1558x377)
65 KB
65 KB PNG
the pajeet menace has reached huggingface
>>
>>102146202
adolf hitler
>>
>>102146202
interesting
>>
>>102146202
Where did he do the needful? Imagegen model?
>>
Is there any hope as a 4GB vramlet to run anything good that isn't 1 T/S
>>
>>102146264
>https://huggingface.co/spaces/dalle-mini/dalle-mini/discussions/11329
>11329 discussions
How many sirs are there?
>>
File: funny.png (71 KB, 845x325)
71 KB
71 KB PNG
>>102146301
hahahaha thirdwordlers cant be human
>>
>>102146295
Absolutely not. I wouldn't even call 4GB 'vramlet' in current year. That's just pure poverty.
>>
>>102146295
https://huggingface.co/Muhammad2003/Orpo-Phi3-3B-128K
Try RPing with this, maybe it's so retarded it loops back to soulful
>>
>>102145958
lmao, great image
>>
>>102146347
fucking indians man, jesus christ
>>
File: 1709515264068417.jpg (567 KB, 1792x2304)
567 KB
567 KB JPG
>>102145958
hello /lmg/
>>
>>102146439
Hello Miku
>>
>>102146202
https://youtu.be/KeVofHZ-VvQ?list=PLihy8fVYWAMBCle6_6-M6YVYBGMNICpQq&t=1384
Reminded me of this. Except Cthulhu in cyberpunk it is modernity and pajeets on the internet. Same feeling. I don't know if those are people or bots.
>>
>>102146295
7b/8b models and one or more of these techniques to make them fit: quantization (don't use less than 4 bit), offloading a few layers to CPU, offloading the kv cache to system RAM, quantizing the kv cache to 4 bit, using a small context length
koboldcpp supports all of those
>>
>>102146439
gm butifel plaes came to india i recieve you
>>
>>102146439
pretty girl please needful now
>>
>>102146202
GOOD MORNING SAARS
>>
>>102146592
gm
>>
did they make claude (cloud model not a local model) more retarded recently
>>
>>102146295
You can run nemo. I just tried it at q8 with 0 layers (ctx limit set to 128k). It used 1884MiB vram and I got 4.9T/s.
>>
>>102146636
no
>>
>>102146347
this is one of those phrases I'm going to end up saying to myself and cackling like a retard for like a week
>>
>>102146636
Did you come over from /aicg/ to ask localbros if claims from claudebros that opus got retarded is true?
>>
>>102146347
Third Worlder here, I confirm I transcend humanity.
>>
>>102146824
no i never go there, since I don't roleplay. if they're saying the same thing though I feel vindicated
I like the idea of local models I just don't have enough ram to run the good ones
>>
Can someone tell me what's I'm doing wrong with my SillyTavern context template? The default ChatML template doesn't do anything to distinguish example messages from the the real chat and I was trying to fix that.

My "example separator":
<|im_start|>system
This is the example separator<|im_end|>

My "chat start":
<|im_start|>system
This is the chat start<|im_end|>


But looking at what's actually being sent to llama.cpp before every example chat the separator is duplicated once correctly and once in the assistant role with the character name prepended
<|im_start|>system
This is the example separator<|im_end|>
<|im_start|>assistant
Waifu: This is the example separator<|im_end|>

and before the start of the real chat there's this nonsense
<|im_start|>system
This is the example separator<|im_end|>
<|im_start|>system
This is the chat start<|im_end|>
>>
File: 1724876676171452.jpg (216 KB, 1024x872)
216 KB
216 KB JPG
>>102146421
finetooning with miku
>>
>>102146755
I am putting my screwdriver everywhere and seeing what is happening
>>
File: file.png (168 KB, 537x1097)
168 KB
168 KB PNG
>>102146949
ST's example dialogue handling can go fuk itself lol.
Basically example dialogues will appear as user/model turns, and separator / chat-start will appear without start/end of turn tokens.

One thing you can do is remove end token from story string and add end token to chat start instead, then Skip Formatting will put unformatted (formatted = turns) example dialogues into the "story string"..
>>
>>102147085
finetrooning
>>
>>102146949
I just set the example text with an indicator sort of like here >>102147207
In the system prompt, I instruct the model to notice that anything under that indicator is an example. That Anon's method is a bit better as it tricks the model into acknowledging that it noticed the examples.
>>
>>102147085
>finetrooning with miku
>>
>>102147273
Why did it take so long for this term to surface?
>>
>>102147085
>fine crooning by migu
>>
File: file.png (18 KB, 465x296)
18 KB
18 KB PNG
>>102146949
Default ChatML template + putting something in separator/start without start/end sequences gives this
<|im_start|>system
SYSTEM PROMPT<|im_end|>
SEPARATOR
<|im_start|>user
User: Test.<|im_end|>
<|im_start|>assistant
Assistant: Test acknowledged.<|im_end|>
START ( [Start a new Chat] presumably implies the above isn't immediately part of the chat )
<|im_start|>user
User: ABC.<|im_end|>
<|im_start|>assistant
Assistant:

Hypothetically it should just work depending on the model. Just weirds me out that there's stuff unenclosed in turn tokens. And that wouldn't happen in chat completion. Doing as I described bottom of >>102147207 will mimic picrel.

Also example dialogues are to be formatted as
<START> (this is the macro for separator, anything else will not work and will be carried into Name:)
{{user}}: something
{{char}}: something
>>
CrisperWhisper: Accurate Timestamps on Verbatim Speech Transcriptions
>We demonstrate that carefully adjusting the tokenizer of the Whisper speech recognition model significantly improves the precision of word-level timestamps when applying dynamic time warping to the decoder's cross-attention scores. We fine-tune the model to produce more verbatim speech transcriptions and employ several techniques to increase robustness against multiple speakers and background noise. These adjustments achieve state-of-the-art performance on benchmarks for verbatim speech transcription, word segmentation, and the timed detection of filler events, and can further mitigate transcription hallucinations.
https://github.com/nyrahealth/CrisperWhisper
no code yet. posting in case it's actually useful
>>
Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems
https://arxiv.org/abs/2408.16293
>Language models have demonstrated remarkable performance in solving reasoning tasks; however, even the strongest models still occasionally make reasoning mistakes. Recently, there has been active research aimed at improving reasoning accuracy, particularly by using pretrained language models to "self-correct" their mistakes via multi-round prompting. In this paper, we follow this line of work but focus on understanding the usefulness of incorporating "error-correction" data directly into the pretraining stage. This data consists of erroneous solution steps immediately followed by their corrections. Using a synthetic math dataset, we show promising results: this type of pretrain data can help language models achieve higher reasoning accuracy directly (i.e., through simple auto-regression, without multi-round prompting) compared to pretraining on the same amount of error-free data. We also delve into many details, such as (1) how this approach differs from beam search, (2) how such data can be prepared, (3) whether masking is needed on the erroneous tokens, (4) the amount of error required, (5) whether such data can be deferred to the fine-tuning stage, and many others.
last part got posted. worth reading the series of watching his presentation https://iv.ggtyler.dev/watch?v=yBL7J0kgldU
>>
>>102146347
https://huggingface.co/spaces/dalle-mini/dalle-mini/discussions/11298
make hot boobs
>>
Based California just killed openjeet meme AI with SB 1047 bill.
>>
>>102147743
*moves to texas*
>>
>>102147793
It should be countrywide desu
>>
Why don't some of my LoRAs show up in webui?
Does it know if it isn't compatible?
>>
>>102147743
Pelosi came out against it so things are looking grim for the leftist crowd
>>
>>102147804
shitty bait
>>
>>102147743
Is that the only reason musk supported it? It seems like a dumb bill otherwise.
>>
>>102147703
Those, and other videos of his helped me learn. recommended.
>>
>>102147793
Don't worry, a version of it will become federal law eventually
>>
>>102147688
404
>>
File: file.png (275 KB, 593x1103)
275 KB
275 KB PNG
lmsys style control
>>
>>102148412
I thought 405B is supposed to be a disappointment that nobody should ever use
>>
File: Untitled.png (58 KB, 1774x414)
58 KB
58 KB PNG
>>102147743
How much money do these large models actually cost to train?
>>
https://www.pornhub.com/view_video.php?viewkey=65a312fa22ee6
>>
File: file.png (144 KB, 623x591)
144 KB
144 KB PNG
Am I doing this right? Is this how it works? I feel like I'm using too much space doing "She has X, she has Y," etc...
>>
>>102149146
>She
intersperse more character name (NOT {{char}})
>>
>>102148937
What is it? I'm at work.
>>
>>102149155
Cool, I'll do that.
>>
>>102149155
Why the name vs {{char}}? Doesn't it end up the same thing?
>>
>>102145958
>>102146421
I would totally watch that
>>
>>102149506
{{char}} is the name of the card, not necessarily the name of the character.
>>
>>102147936
his gpu clusters are in Texas. So yeah Californians need some serious regulation. They can't be trusted
>>
>there's a Satania lora finally
nice
>>
>>102149796
bikini and megatits NOW
>>
>>102147811
Yes.
>>
>>102149928
Sorry, too busy genning memes.

Also, feels like the image quality is a bit wonky with this lora sometimes. Oh well, maybe I'll wait for a better one to get made.
>>
File: file.png (248 KB, 710x300)
248 KB
248 KB PNG
It is time
>>
>>102150192
August 29 was yesterday.
>>
>>102150119
just lower the strength a little
>>
Is there any good free upscaler? I'm poor.
>>
>>102150294
It's already not getting her hair right a lot of the time so doing this makes it look even less like Satania.
>>
>>102147085
cute!
>>
>>102150192
kys
>>
>>102150119
I'm personally finding training flux loras to be pretty wonky. i'm kinda finding it hard getting a concept in without inadvertently making a bad style lora. Probably gonna try out masked training loss, just need to use SAM 2 to segment out relevant bits.
>>
>>102150376
the other strength
>>
>>102145958
>Qwen2-VL 7B
just a sidegrade to Idefics3 8B LLaMA
>>
>>102150721
How are you running it if you don't mind me asking?
>>
>>102149591
thank you
>>
What was that thing called that was supposed to replace lore books?
>>
>>102151345
... worldinfo?
>>
>>102151365
I don't think so? I don't know, that's why I'm asking. I remember reading there was some new system that replaced all the lore/world book stuff entirely (somehow) but lost the page.
>>
Unbelievable how much hype llama3 had and a couple months later there isn't a mythomax3 or anything really.
>>
>>102145961
hot
>>
>>102151523
different architectures and all the finetunes that were used to merge into mythomax either don't exist, or their datasets have been changed far enough that they're not the same anymore.
>>
>>102151523
we just have to WAIT
>>
>>102151073
I didn't try to run any of those models, I only looked at the benchmarks. But I hope to get around to playing with vLLMs sooner or later.
>>
>>102150721
Qwen2-VL can take videos over 20 minutes long as input. Idefics3 is image only.
>>
>>102146439
>watermark on an ai generated image
good maarning sar
>>
>>102146301
the line between bot and indian is very thin
>>
Dear /g/ sars. I'm wanting to build an autonomous lawn mower that uses a camera (and GPS for setting boundaries) to do the needful on my property.
The neural net will be a simple image classifier:
Image In -> Bearing (Angle) Out
I'm planning on getting test data by mounting a camera + GPS to determine bearing (and to some degree speed) onto my existing Hustler Raptor™ 42" Zero-Turn.
To make the Neural Net flexible (and mountable at different heights, angles, etc - and on different frames), I've been thinking that if I put some standout-ish red tape around the edges of the mower, the network might be able to learn this as a concept of when it needs to turn (e.g red tape touches long grass).
Does this sound like it could work? Any other suggestions?
>>
File: 1707913599966602.png (53 KB, 976x433)
53 KB
53 KB PNG
>>102146301
>>
>>102151992
why would you need that, just mow the whole lawn at once
>>
>>102151992
You're going to kill someone (or at least someone's dog) with an out of control lawnmower
>>
>>102151992
Sir this is the cooming to purple prose general, we have no actual idea about machine learning
>>
>>102151992
Please record the camera data and post the victims here.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.