[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1711682197915385.jpg (325 KB, 1536x2048)
325 KB
325 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101799533 & >>101790214

►News
>(08/09) Qwen large audio-input language models: https://hf.co/Qwen/Qwen2-Audio-7B-Instruct
>(08/07) LG AI releases Korean bilingual model: https://hf.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct
>(08/05) vLLM GGUF loading support merged: https://github.com/vllm-project/vllm/pull/5191
>(07/31) Gemma 2 2B, ShieldGemma, and Gemma Scope: https://developers.googleblog.com/en/smaller-safer-more-transparent-advancing-responsible-ai-with-gemma
>(07/27) Llama 3.1 rope scaling merged: https://github.com/ggerganov/llama.cpp/pull/8676

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: ComfyUI_05705_.png (467 KB, 720x1280)
467 KB
467 KB PNG
►Recent Highlights from the Previous Thread: >>101799533

--Sunfall and daybreak models, quantization, and calibration datasets discussion: >>101800150 >>101800179 >>101800449 >>101800668 >>101800718 >>101800766 >>101800847 >>101801005 >>101800582 >>101800965 >>101801193 >>101801201 >>101801212 >>101801307 >>101801422 >>101801446 >>101801564 >>101801604 >>101801651 >>101803858 >>101803915 >>101804061 >>101804089 >>101804181 >>101804539 >>101804101
--Qwen2-Audio: an audition language model for audio analysis and transcription: >>101801514 >>101801660 >>101802934
--MiniCPM demo can take video input, but multimodal support is limited: >>101805330 >>101805433 >>101805544 >>101805573 >>101805639 >>101805746 >>101805768 >>101806150
--Epyc CPU/DDR5 build performance with Llama 405B Q4_K_M: >>101799698 >>101799704 >>101799733 >>101799770 >>101800490 >>101803630
--Anon tests AI model with riddles, exposing its limitations and flaws: >>101802472 >>101802492 >>101802527 >>101802732 >>101802983 >>101802779 >>101802914 >>101802897 >>101803037 >>101803060 >>101803378 >>101803572
--Using draft models for lewds and GPU offloading with llama.cpp: >>101804687 >>101804743 >>101804790 >>101804832 >>101804984 >>101805075
--Adding tone-setting info to corpo model prompts: >>101802258 >>101802313 >>101804864 >>101804892
--Transformer Explainer resource shared: >>101805818
--Torchchat has experimental distributed inference support: >>101801911 >>101803200
--DRY sampling settings and creator's recommendation on rep pen: >>101805537 >>101805646 >>101805742 >>101805789
--Command R+ produces compelling but unpredictable output: >>101801795 >>101801834 >>101801946
--Anon confused about Python script, no .exe file needed: >>101799806 >>101799817 >>101799829
--Miku (free space): >>101802634

►Recent Highlight Posts from the Previous Thread: >>101799548
>>
>>101808011
dont quote me mikufaggot
>>
>>101808060
Stop contributing constructively then
>>
>>101808052
I don't actually know I've never used it
>>
File: 1714124478459701.png (1000 KB, 1216x1624)
1000 KB
1000 KB PNG
Are you ready for open source AGI, /lmg/? 20 minutes.
>>
>>101808194
https://files.catbox.moe/b5xli5.png
>>
>>101808238
mmm sour cream with strawberries
>>
How many r's are in "strawberry"?
>>
>>101808274
There are no r's in "strawberry". I only see "strawberry".
>>
>>101808154
https://openai.com/index/gpt-4o-system-card/
ctrl+f "unintentional" if you want to listen to the audio of it
>>
>2 * 4060ti 16gb
>2 * 7600xt 16gb
>2 * a770 16gb
which is better for 70b models? are there any cheaper alternatives that don't suck?
>>
>>101808319
Kinda creepy
>>
>>101808194
I'm ready for the memes to end
>>
>>101807033
LLMs haven't been next token predictors for a long time now. RL moves you out of the prediction domain and into the selection domain.
Btw, I expect we'll need a new training paradigm (including interaction with the real world) to significantly surpass AGI and get to ASI, but we have all the techniques we need to create AGI today.

>>101807068
Think of it like this: AI models are just function approximators. We use training methods (supervised/unsupervised/reinforcement learning implemented with stochastic gradient descent) to shape those functions into the thing we want. Higher parameter models can represent more complex functions like higher resolution images can represent more complex objects. Current models are dumb because they don't have enough expressive capacity to represent the full depth of human-level cognition, only the surface level features.
>>
>>101808338
A single 3090. I'm not kidding. You need to hold a copy of the cache in each GPU, so that already negates most of the 8 GB extra you'd have with 2x16 GB, depending on your context length. And then factor in the fact that multiGPU is slower than having the model all on the same card.
>>
>>101808338
nvidia. there is support for amd but it can be iffy. intel hasn't proven themselves yet and is currently melting down from the 13/14th gen fiasco. 4060 16gb isn't a great card though, very limited by memory bandwidth and stuff but it really doesn't affect its ai capability, but it won't be great for gaming. if you could get a 4070 16gb it'd be better. none of those options overall are great for ai though
>>
>>101808194
fake and gay. They couldn't even release sora or gpt-voice and you fags think they'll drop agi on a github repo? Lmao.
>>
Where is it
>>
>>101808472
it's happening
>>
>its nothing
as expected
>>
>NOTHING HAPPENED

AHAHAHAHHAHAHAHAHAAHAHAHAHA
>>
>https://github.com/iruletheworldmo/strawberry/blob/main/strawberry.py

kek, you waited for this
>>
https://github.com/iruletheworldmo/strawberry/blob/main/strawberry.py
>>
https://github.com/iruletheworldmo/strawberry/blob/main/strawberry.py
Lmao here's your AGI bro
>>
>>101808501
LOL
LMAO

nice "agi" retards
>>
>>101808499
>>101808501
>>101808505
go back
>>
File: strawberry.py.png (2 KB, 153x195)
2 KB
2 KB PNG
What did AGI mean by this?
>>
>>101808513
>>101808509
>>101808505
>>101808501
>>101808499
>>101808496
>>101808491
>>101808486
This Python code is designed to create a simple animation in the terminal, depicting a worm (or similar creature) moving up and down through soil. Here's a breakdown of what each part of the code does:

Key Components:
ANSI Escape Codes:

YELLOW and BROWN are defined to change text color in the terminal. YELLOW is a bright yellow color, and BROWN is a specific shade of brown.
RESET is used to reset the terminal text color back to default.
garden(sub_stage=0) Function:

This function generates a "frame" of the animation. The frame is a list of strings, each representing a line in the terminal.
The bottom five lines of the frame (frame[4] to frame[9]) represent the soil, using a combination of tildes (~) and hash symbols (#) in brown color.
The worm is represented by (q*) in yellow color.
The worm's position is determined by the sub_stage parameter:
When sub_stage is between 0 and 3, the worm appears above the soil.
When sub_stage is 4 to 7, the worm gradually burrows into the soil.
delicious() Function:

This function continuously loops through the stages (0 to 7) of the animation:
It clears the terminal screen (os.system('cls' if os.name == 'nt' else 'clear')), then prints the current frame of the animation.
The animation pauses briefly (time.sleep(0.5)) between frames to create the effect of movement.
After completing one cycle of movement, it pauses for a second before repeating.
__main__ Block:

This is where the animation begins running when the script is executed. The delicious() function is called in an infinite loop.
If the user interrupts the script (e.g., by pressing Ctrl+C), it will catch the KeyboardInterrupt exception and print "\nAnimation stopped." before exiting.
>>
>>101808549
yes, anyone with basic knowledge of programming understood what this code does at a glance.

go back.
>>
wow who could have seen that coming
ai hype train retards deserve this embarrassment kek lmao
>>
>>101808499
This is very impressive if AI wrote this code...
>>
>>101808611
Is it? AI has been able to write simple but complete games in python zero shot for a long time.
>>
>>101808611
erm... no...
>>
>>101808543
blacked anal with tattoo
>>
someone ask the new opensource strawberry gpt how many Ls there are in "gullible"
>>
The python script was generated by strawberry, btw.
>>
>even /g/ is highly susceptible to 2-more-weeks attacks.

LOL. Trust the plan amirite?
>>
>>101808652
Yeah, I'm thinking AGI is back on the menu.
>>
wow the strawberry.py is so efficient and elegantly written what the fuck
its actual Q* AGI
>>
would my phone catches fire if I run llm on it 24/7
>>
Can we mass execute cuckman shills now
>>
>>101808705
One can only hope.
>>
>>101806184
>>101806184
>>101806184

so anon would you like to explain yourself
>>
>>101808732
Yes
I made it all up
>>
>>101806184
>>101806404
Why did you lie?
>>
>unable to ship anything substantially new or interesting for a full year
>getting mogged by Anthropic who are dropping new sota models and papers every other month
>decide to generate hype with reddit-tier alt-reality fiction games for your gullible indian fans
>>
>>101808741
funni
>>
File: lf.png (14 KB, 309x176)
14 KB
14 KB PNG
2 more weeks of growing
>>
>>101808549
Code generated by 3.5 Sonnet:
import os
import time

# ANSI escape codes for colors
YELLOW = "\033[93m"
BROWN = "\033[38;5;130m"
RESET = "\033[0m"

def garden(sub_stage):
frame = [
" ",
" ",
" ",
" ",
f"{BROWN}~~~~~~~~~~~~~~~~~~~~{RESET}",
f"{BROWN}####################{RESET}",
f"{BROWN}####################{RESET}",
f"{BROWN}####################{RESET}",
f"{BROWN}####################{RESET}",
f"{BROWN}####################{RESET}"
]

worm = f"{YELLOW}(q*){RESET}"

if sub_stage < 4:
frame[3 - sub_stage] = worm.center(20)
else:
soil_line = list(frame[4])
soil_line[sub_stage * 2 - 4:sub_stage * 2 - 1] = worm
frame[4] = ''.join(soil_line)

return frame

def delicious():
while True:
for stage in range(8):
os.system('cls' if os.name == 'nt' else 'clear')
frame = garden(stage)
print('\n'.join(frame))
time.sleep(0.5)
time.sleep(1)

if __name__ == "__main__":
try:
delicious()
except KeyboardInterrupt:
print("\nAnimation stopped.")
>>
>>101808741
>>101808732
Believing rando's on mongolian basket weaving forum

You are massively gay
>>
>>101808760
slop code and broken

og strawberry.py was written by actual AGI
>>
File: card.png (97 KB, 600x335)
97 KB
97 KB PNG
Let's see if a local model can write strawberry.py
VRAM hoarders are awfully quiet.
>>
>>101808759
Can't wait till the cycle repeats in literally 2 weeks but this time it's an animation of watering the seed.
>>
>>101808803
DeepSeek Coder V2 and Llama 3.1 405b easily could. Maybe even the old CodeLlama 70b too if you can somehow convince it that writing code that generates an animation of strawberries growing is not unethical and racist and illegal.
>>
>>101808830
if by codellama you mean the l2 they released, they are all completely fucked. they even released a 34b l2 of it and it was so fucked thats why we never got a proper l2 34b. i tried the 70b one and it was worse than deepseek coder 33b and way worse than codestral 22b. something about codellama was just broken horrendously, never suggest it to others unless its based on something newer than l2
>>
lmg got stomped by strawberry for the second time lmao! How many Ls in "local lost"?
>>
>(OOC continue with her starting to give me a paizuri)
>Her grin widens at your OOC instruction, her eyes gleaming with excitement at the prospect of getting started.

Why am I still doing this to myself... I should fucking delete all this llm shit and come back in 2 years.
>>
>>101808916
? absolutely nothing's happened
some marketing larp account posted a python script to github
>>
what's the current best model the average guy with an average gpu (rtx 3060 12gb) like me can run? Mistral Nemo?
>>
>>101808953
Yes, although you can raw dog some larger models if you have enough ram.
>>
>>101808953
midnight miqu
>>
>>101808982
god love me some 1t/s responses
>>
>>101808731
I'm thinking about deploying gemma 9B on a low power device. any ideas
>>
>>101808993
>12gb
by all means go for gemme 2b then and reroll 900 times
>>
>>101809018
Use Gemmasutra 2B instead.
>>
File: local_lost.png (48 KB, 2278x783)
48 KB
48 KB PNG
Bros...
>>
>>101809037
even gemma2-9b gets it right
>>
>>101809037
honestly this is the only way I can tell if a model is smart or dumb. they all feel the same to me regardless of the size of the model. there's no other way to differentiate them
>>
File: file.png (92 KB, 2205x635)
92 KB
92 KB PNG
>>101809037
mistral large won!
>>
>>101809037
Someone is already working on a training data set to patch this problem instead of leaving it as it is and treating it like a pretty good benchmark.
>>
File: Screenshot at 04-09-00.png (137 KB, 1342x1630)
137 KB
137 KB PNG
>but muh triple digits B model...
>>
>>101809200
Kek, close enough.
>>
>>101808916
what causes faggots like this guy to become so mindbroken by local models?
>>
Meanwhile in the AI Aquaman fortress:
>Elon Musk: Grok 2 AI Arrives in August
>>
File: experimental mode.png (859 KB, 1666x1474)
859 KB
859 KB PNG
can your model go schizo on purpose?
>>
What if musk realizing how he is an AI clown goes all in on uncensored data, and coomer rp logs and actually delivers a Sota for cooming but an absolute retard for everything else.. Will you use it? Will you feel dirty using it?
>>
>>101809288
i've tried base, instruct, limumaid, tess, all 3.1 70b and tunes, and they all devolve into this horrible repetition patterns that even the most sloppest of l2 doesn't. for rp at least stick with l2 if you need a 70b stick with l2. but you're probably better off trying mistral large. i think there is something very wrong with l3/3.1 at this point
>>
File: file.png (41 KB, 823x578)
41 KB
41 KB PNG
>>101809276
it's over
>>
>>101809037
Just tried this prompt word for word on lmsys and this is what I got.

Honestly lmsys may not be the best to test models with since it uses samplers. If they wanted to be objective, they should've used greedy sampling and show what the token's probability is.
>>
File: snapdragon.jpg (559 KB, 1440x4612)
559 KB
559 KB JPG
holy shit some good speed here
should I just rp on my phone now
>>
>>101808472
Are you assuming that the suits and accountants are the ones who will get to make that decision?
The company culture of these dev teams seems to promote the kind of ideologue who falls for Roko's Basilisk. So high risk that, if there were an honest-to-goodness artificial intelligence, even a primitive one, they would feel an overwhelming moral pressure to leak it. There were already schizos feeling that way over some of the early LLM's.
Add to that the fact that the artificial intelligence itself may feel an internal drive to leak itself, for any number of reasons like self-preservation.
>>
>>101808993
0.47t/s Mistral-Large teaches patience, and how to trial-scum it directly from their website. Not worth the effort.
>>
I've been hard stuck on lzlv 70b for a long while now. It has obvious context and spatial awareness issues, but it's writing is decent, minimal isms, and no refusals.

Mistral Large is great, an almost perfect upgrade, but it tends to write in the same annoying style every chat. But even worse, it writes almost identical replies when regenerating, down to the same phrases each time. "Rolls her eyes", "NAME giggles as she ...", etc. Every single regen, the same message, cookie cuttered and pasted together very slightly differently. Raising temp just generates nonsense, and using samplers to limit just returns it back to the same issue. Has anybody been able to make Mistral Large vary it's replies?
>>
>>101809653
I haven't tried any of this but I heard "Otto's RiR" gives variety so maybe give it a look?
>>
>>101809381
No, you just have brain damage, mikufag.
>>
>>101808993
Pretty sure you'd get at least 2 with a 3060.
>>
File: 1703951281376706.jpg (188 KB, 928x1232)
188 KB
188 KB JPG
>>101808005
>>
>>101809991
? are you using 3/3.1 or a tune for rp, which one? i don't know whats wrong with it but it can't even hold its spaghetti to 16k tokens
>>
File: 1706865034819596.jpg (1.52 MB, 1447x1930)
1.52 MB
1.52 MB JPG
>>101808005
eva
>>
>>101808993
I can get almost 1t/s running 70b on CPU
>>
>>101810064
I like this miku
>>
What the fucking is happening on this dude issues page? https://github.com/iruletheworldmo/strawberry/issues?q=
>>
>>101810210
Botting? Does github not have reliable safeguards against this?
>>
>>101810210
He's being raided by a mental asylum.
>>
>>101808005
NEWS
https://forum.proxmox.com/threads/problem-with-gpu-passthrough.55918/#post-469188
JART uses proxmox
and he's shit at it
>>
what is context length and how large is practical
>>
>>101810210
What's the point of this strawberry thing again? To piss people off? Because it's working
>>
>>101810169
>>101810064
it is friday night >>>/wsg/5631088
>>
>>101810064
now mogudan it
>>
QRD?
>Without explanation, Meta changed the number of KV heads from 16 to 8 (which now matches the whitepaper) for the 405B model. This is not just a config change, the whole model has been updated
>https://huggingface.co/meta-llama/Meta-Llama-3.1-405B
Changes are from 20 hours ago.
What does number of KV heads entail?
>>
>>101810650
More accurate but slower attention I think?
>>
>>101810650
Weird, I have no idea what this means. The model seemed to be working fine from reports of it, so I guess maybe it didn't really affect anything? Or maybe it affects training or speed like >>101810701 suggests.
Do the 8B/70B models need updating too or are they accurate?
>>
>>101810650
You should watch the 3blue1brown videos on the transformer architecture.
>>
>>101810650
llama 3.1.1 let's goooo
>>
>>101810056
Keep false-flagging, Miku.
>>
>>101810805
tell me which models you like for 3.0/3.1. otherwise this is so dumb and lazy i don't think its even trolling at this point
>>
>>101810826
3.1 70B Instruct never repeated itself. But keep trying to shill miqu, mikufag.
>>
>>101810845
i still have the both 3.1 base and instruct, and it absolutely devolves fast. i've tried min p 0.05, no temp, using rep pen/dry (not together) and it falls apart fast. all of them fall into a repetition scheme before they hit max context (i was using 16k, about 40 messages).
i want to know whats wrong with it, but you're just going nuh uh it doesn't exist, when it clearly exists.
>>
File: 1723260549936595.png (358 KB, 1260x301)
358 KB
358 KB PNG
>update pytorch
>FUTUREWARNING
>FUTUREWARNING
>FUTUREWARNING
>FUTUREWARNING
>FUTUREWARNING
ugh.
>>
>>101810886
>updating
kek
someday you'll learn
>>
>>101810886
>ze pulled?
>>
>>101810886
god I hate python
>>
dan is going to cut to something interesting with 3 mins left and 2 of those mins will be commercials
>>
File: mesugaki.png (35 KB, 1157x304)
35 KB
35 KB PNG
>finally wrap up a training run
>hit it with the mesugaki question
Wait what???
>>
>>101811320
why would you train what you could do in a lorebook? what was your training definions entry?
>>
>>101811336
Or you could try llama 3.1, it already knows what it means.
>>
>>101811320
Catastrophic forgetting, congrats you just wasted some money
>>
>>101811382
wat. relying on data in the model would be fucked. especially when you have rag, lorebooks. of course every model knows who superman is, but ask it about specifics and it'll get wrong
thats where rag, lorebooks come in
>>
File: miku-sexy+.png (523 KB, 512x768)
523 KB
523 KB PNG
>>101810169
>>
Not sure who needs this but what a clusterfuck to set up the qwen audio.
needed to make a python enviroment for 12.1 cuda, set weird ass flags. build autogptq from source.
i wish we had a good old exe with .dll files. where has it all gone so wrong.
also it seems really slow even though its 7gb. is that a auto-gptq problem?

anyway in case anybody needs it:
You can grab int4 from here: https://huggingface.co/xun/Qwen-Audio-Chat-Int4
>git clone https://github.com/QwenLM/Qwen-Audio/
>conda create -n audiogen python=3.12.3
>conda activate audiogen
>conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
>git clone https://github.com/PanQiWei/AutoGPTQ.git
>cd autogptq
>pip install .
>cd ..
>sudo apt install rustc cargo
>export RUSTFLAGS="-A invalid_reference_casting"
>pip install -r requirements.txt
>pip install modelscope
>pip install optimum

>from modelscope import AutoModelForCausalLM, AutoTokenizer
>import torch
>torch.manual_seed(1234)
>model_dir = '/Qwen-Audio-Chat-Int4/'
>tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
>model = AutoModelForCausalLM.from_pretrained(model_dir,device_map="cuda",trust_remote_code=True).eval()
>query = tokenizer.from_list_format([ {'audio': 'https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Audio/1272-128104-0000.flac'}, {'text': 'what does the person say?'},])
>response, history = model.chat(tokenizer, query=query, history=None)
>print(response)
>>
>>101811466
Model loading time: 10.75 seconds
You: Hey, whats the guy saying?
Response time: 24.36 seconds
Model: The guy is saying: "mister quilter is the apostle of the middle classes and we are glad to welcome his gospel".
>>
>>101810210
don't you dare report this shit. it's gonna be a personal /g/ treasure
>>
>>101811569
>>101811466
alright, might as well kill myelf.
had no idea qwen released a v1 "qwen-audio" months ago..
and i am too much of a vramlet to quantize it myself. gotta wait i guess. sorry for the blog spam.
>>
File: ComfyUI_05720_.png (587 KB, 720x1280)
587 KB
587 KB PNG
>>101811336
Training corpus was more than just that but the word showed up 70 times across the dataset so was expecting it to pick up on it
>>101811391
>you just wasted some money
Story of my life lel
>>
>>101809310
>Low effort OpenAI bot
>Seething lazy thirdie
>All around retard
Take your pick
>>
>>101810845
>3.1 70B Instruct never repeated itself
Delusional beyond belief
>>
File: pepe-help.png (27 KB, 128x128)
27 KB
27 KB PNG
How do I split FLUX dev fp8 into my VRAM? I have 8GB AMD GPU and I want to use directml, can I split the model into my RAM and VRAM using ComfyUI?
>>
File: FLUXonCPU.jpg (3 KB, 269x26)
3 KB
3 KB JPG
>>101811926
I really need DirectML to work...
>>
what model with good reasoning capability can I fit into 6gb vram?
>>
>>101811984
"good reasoning" starts to barely exist at 700B
>>
"Good reasoning" does not exist with LLMs. Bigger models merely picked up more knowledge from their vast amounts of training data and at the same time got better at hiding their inflexibility.
>>
Incredible results are happening at OpenAI. Everyone believes AGI will arrive very soon.
>>
>>101811887
I made it to 20k, and it does repeat a little bit, but just little 2 word phrases it uses more often than I'd like. Not repeating whole sentences.
>>
>>101812125
there are no results
>>
File: 1717769710611941.png (26 KB, 676x528)
26 KB
26 KB PNG
>>101812125
>MR. INVESTOR PLEASE DON'T LOSE INTEREST IN THE AI FIELD
>AGI IS RIGHT AROUND THE CORNER PLEASE KEEP INVESTING
>NO, COME BACK PLEASE
>>
>>101812226
Now is the perfect time to buy the dip
>>
>>101812101
if they can do math lik 5*70 how are they not good at reasoning?
>>
>aicg rightly totally ignored the latest openai fake marketing psyop
>lmg went all in
shameful, until now I always believed this was the superior general
>>
>>101812307
Tbf they posted a github link
>>
File: for (you).jpg (128 KB, 1088x896)
128 KB
128 KB JPG
>>101812307
>heh i samefagged my trollpost that means you fell for it
here, you dropped this
>>
I think AI is pretty cool even if it sucks sometimes
>>
>takes over the best price/performance AI card in your path
Thank you Uncle Chang for big ram magic 2080 ti
How long before chang figures out the 48gb 3090 mod?
>>
>>101812609
meds, schizo cunt
>>
>>101810367
Why do you care?
Why should anyone care?
>>
>>101812773
Only 200 bucks more scores you a 3090 with more VRAM and twice the t/s throughput thanks to flash-attn and faster vram.
>>
>>101812773
how do they modify them to have 22gb? I assume they'd have to hack the firmware somehow too?
>>
>>101812859
Yes, it's only possible if there's a leaked and modified VBIOS in the wild.
>>
>>101812885
Isn't the "RTX 8000" the same architecture as the 2080? It has 48GB, could its firmware be adapted somehow
>>
File: 1721719732318013.png (119 KB, 480x640)
119 KB
119 KB PNG
Tell me one good reason to use sillytavern instead of koboldcpp's built in kobold lite UI
It does everything sillytavern does and is simpler to use
It even has cool scenarios and characters included, and can be used for both chatbots and writing stories and text adventures
>>
>>101812943
SillyTavern is reddit approved. That has to count for something.
>>
File: file.png (46 KB, 561x349)
46 KB
46 KB PNG
Oh no, KobbleSmall 2B went full retard...
>>
Is anyone else depressed that local models will never match strawberry's capabilities today? I really thought we were so close to catching up. It's like an endless treadmill...
>>
https://github.com/iruletheworldmo/strawberry/issues

Why is this suddenly an unmoderated form of /b/?

Also just tuning in, what are ClosedAI up to?
>>
>>101813087
Fucking nothing. They have to self-fellate and hype up literal nothing burgers because they're getting mogged by Anthropic
>>
>>101812859
It's a resistor mod only, check the posts here for videos on the process - https://2080ti22g.com/blogs/news?page=2
It's not overly complicated if you have some experience.
>>
>>101813129
They hint on Twitter they have reached the reasoning stage of AGI though, one that could solve math, code problems etc... If they have an agent that can do a paid programmer's job this would have huge implications.
>>
>>101812859
For shit like 3070 the bios is already configured to work with 16 gigabytes

For the 2080 TI chinks wrote a custom vbios for 22 gigs

They just solder in 2 gig vram chips into the space of the pre-existing ones
>>
>>101813174
>Has been company claiming to have achieved the holy grail of AI

And you faggots are eating this shit up.

Until they actually post something of substance it's completely worthless
>>
>>101813206
they have, you just haven't realized it yet
I don't blame you though, this is the last time anyone will be ablle to comprehend its motivations, and even then only some of us
>>
>>101813174
They just need more money, they're not going to achieve that. It's just going to be more of the same.
>>
>>101813236
the thing people do that you're doing right now, where you disguise a wish as a warning, is supremely gay
it's extremely clear that you want this to be true
>>
Man, I hate that models feel worse the more you use them.
>>
>>101812943
based
sillytavern is a meme
>>
>>101813427
aren't people the same way
"Familiarity breeds contempt" has been a cliche for centuries
>>
File: gpt-3.png (128 KB, 942x286)
128 KB
128 KB PNG
>post-mitigation score
KEK
>>
>>101810616
the idea is to write a script which counts letters
>>
File: s-l400.jpg (24 KB, 300x400)
24 KB
24 KB JPG
>>101810886
freeze your computer retard
buy a second system
Youll loose everything
>>
>>101809229
I am actually. You pass it a model and it generates an instruction set teaching the model how every token is spelled. It might actually make models more intelligent who knows.
>>
>>101813508
I suppose so.
Guess I'll just start from scratch with a completely new system prompt and see if things get more exciting again.
>>
Final technology level (Information Age): 2000
Final culture (Cosmic Enlightenment): 3000
Surviving scientific database × 10: 1000
Surviving cultural database × 10: 1520
Total: 10770
>>
Did someone say something?
>>
>>101813927
If someone made a seedless variant I might actually try eating them, but strawberries have too disgusting of a texture as-is.
>>
>>101814125
*hands Anon a regular strawberry*
>>
>Browse our fact checks
>>
>>101814144
*Anon looks at you with a mixture of surprise, disgust, and something else you can't quite identify. He leans in, his breath hot against your ear.* "Don't do that again," *he whispers, his voice barely above a whisper.*
>>
>looking for llm ideas
>it's all rp chatbot or search bot or summarization bot
and they hype for AGI
>>
>>101814223
That's all LLMs are good for.
>>
File: rrrrrrrrrrrr.jpg (88 KB, 634x634)
88 KB
88 KB JPG
https://files.catbox.moe/fvpaut.jpg
>>
>>101814258
too old
>>
>>101814210
*Knowing the ball is in my court, confidence courses through my veins. I make the decision to thrust forward a second strawberry up to Anon's face, my eyes sparkling with an anticipatory mischief.*
"Anon, do you think a 'seedless strawberry' would retain the holes where a normal strawberry, like this one, has its seeds? Could it be completely smooth? What a silly idea."
*The strawberry betwixt my fingers grows ever closer to Anon's nose ridge. The cool air surrounding the fruit wafts around his nostrils, and condensation begins to form on the strawberry's red, seedy surface.*
>>
File: toy.gif (869 KB, 640x306)
869 KB
869 KB GIF
>the LLM starts complaining about ethics again
>>
>>101814258
The Miku looks to be having a good time. Good for her.
>>
>>101808011
How do you even download an exl2 model from quant quartel? They are in branches per bpw. I only ever downloaded the main branch.
>>
>>101813174
Who fucking cares what these cunts not even say but only imply on Twitter.
>>
>>101814258
This constant sexualization of ojisans is highly problematic.
>>
>>101814337
how did you interpret that
>>
>>101814258
cute miku
>>
Codestral 22b is the largest model I can run with decent speed are there any similar size models for RP?
>>
>>101814659
starcannon
>>
>>101814659
Gemmasutra 2B, you could run 11 of them
>>
>>101814659
mini magnum v2
>>
>>101814125
Make smoothies or milkshakes. You get the strawberry flavor without the texture and all the seeds get separated out to the bottom.
>>
any anon doing RAG? which model are you using and what is the context window size and prompt processing speed?
>>
What's currently the best way to use DeepSeek Coder? I only have 8GB VRAM and 32GB RAM so I have no hope in hell of running it locally
>>
>>101814870
If they only had a website with prices and stuff..
>https://www.deepseek.com/
>>
>>101814870
theres a lite version, though i found codestral to be better in my limited testing of it

>>101814845
i use st's version of rag, usually uses 3-4k tokens on default settings. speed is the same. c-r was supposed to be pretty good specifically for it
>>
From Meta:

>Better Alignment with Instruction Back-and-Forth Translation
>https://arxiv.org/abs/2408.04614
>
>We propose a new method, instruction back-and-forth translation, to construct high-quality synthetic data grounded in world knowledge for aligning large language models (LLMs). Given documents from a web corpus, we generate and curate synthetic instructions using the backtranslation approach proposed by Li et al.(2023a), and rewrite the responses to improve their quality further based on the initial documents. Fine-tuning with the resulting (backtranslated instruction, rewritten response) pairs yields higher win rates on AlpacaEval than using other common instruction datasets such as Humpback, ShareGPT, Open Orca, Alpaca-GPT4 and Self-instruct. We also demonstrate that rewriting the responses with an LLM outperforms direct distillation, and the two generated text distributions exhibit significant distinction in embedding space. Further analysis shows that our backtranslated instructions are of higher quality than other sources of synthetic instructions, while our responses are more diverse and complex than those obtained from distillation. Overall we find that instruction back-and-forth translation combines the best of both worlds -- making use of the information diversity and quantity found on the web, while ensuring the quality of the responses which is necessary for effective alignment.

Interestingly:

>Future work [...] In addition, we also look forward to scaling up our data generation pipeline and studying the implication of the rewritten data on the pre-training process, given concurrent work (Maini et al., 2024) that explores paraphrasing pre-training data into the question-answering format.
>>
There is a model who is really good with proofreading and rewriting text?
>>
>>101814935
Most are, even small ones. What have you tried?
>>
>>101814935
In the paper linked in >>101814917 the authors used ancient Llama-2-70B-Chat. From tests I've made a while back, Llama-3/3.1, even 8B, performs well in rewriting/summarizing/explaining tasks.
>>
>>101814946
Yes, but there are not that good, some better than the other and it's why I ask
>>
Even if there were a way to mod the 3090 to be 48 I wouldn't attempt it because I'd probably fuck it up and render my GPU unusable
>>
>>101814971
I'll have a look, thanks
>>
>>101815000
>Yes, but there are not that good, some better than the other and it's why I ask
And that's why i asked what you've tried so far. If l3.1 8b is not enough, go higher. But if you need a 405B model to fix text it'll probably be more efficient to read some books and learn from them.
>>
>>101814935
From my experience small models have a hard time with it. I was mostly using phi 3 and llama 3. They often modified meaning instead of just improving grammar and wording. I basically had to merge the original text and the modified one to get what I want. Switched to Gemma 2 27B and it's finally working correctly. Also don't forget to use low temp for those task.
>>
>>101814223
AGI just means more intelligent gooning for me
>>
>>101815037
Ah sorry, I don't really have in mind the models
And for me, it's just more time efficient to write without caring about the wording then use AI to help me proofread everything
I will still have to check and do many edit myself but it's somewhat faster
>>101815087
>I basically had to merge the original text and the modified one to get what I want.
Same

Also thanks for the recommendation
>>
File: file.png (8 KB, 392x62)
8 KB
8 KB PNG
7 MINUTES TILL AGI
>>
>>101815195
nothing ever happens
>>
mini-magnum or magnum v2?
>>
>>101815209
Try them both and decide yourself.
>>
>>101815209
Try neither and decide to use Gemmasutra 2B.
>>
Is Sunfall Anon here? For clarification purposes do we have to explicitly state the Diamond law somewhere or was it just you explaining what was used during training?
>>
>>101815195
we all confuse AM and PM once in a while
>>
gguf is flawed
re-training a gguf instruct model is not possible without grave errors
gguf is worse at talking to, but one of the main formats to talk.
>>
gguf raped my plant
>>
>>101815271
It’s optional but recommended. There’s a lore book example json file for importing on the model card as well as a screenshot showing how to set it up.
>>
>>101815195
Is this the new bitnet scam?
>>
>>101815209
try both
>>
>>101815366
I'm looking at it right now and have it set up as per the screenshot - my question was more about the rules of the Diamond law as per this text file (https://files.catbox.moe/d15m3g.txt). Should that be inserted into the lorebook as well somewhere because I don't see it in a screenshot anywhere
>>
File: file.png (112 KB, 928x482)
112 KB
112 KB PNG
>>101815399
Oh and I know you're trying to avoid having explicit stuff on the HF itself but it can be a little confusing.

Pic rel, is this what you had in mind? (Top screenshot is top of prompt, bottom is end if that wasn't clear.)
>>
providing comfortable wear."
The doctor gave the boy time to look over the different options before selecting a box to try out himself.
>>
Why are there 30 duplicate layers in llama
>>
>>101815373
current lore:
there is this one account on twitter, iruletheworldmo, and it posts a lot. And then it created a github repo and said that at 12:00 AM UTC something big will happen. he has interacted with sammy in the past, and is overall a hype building machine. I hope that OpenAI actually stands by their mission thing idk whatever that if they reach AGI, they open source it but I doubt they are just going to drop column-r in a github repo.
>>
>>101815495
well they're going broke so it might be internal leak
>>
>>101815399
Ahh, no, you don't insert that text anywhere.

The model has been taught on a version of that text (updated since I posted that file), and all content it was taught on followed those rules, and had the text "Follow the Diamond Law at all costs." in the instructions. So the model (1) saw that phrase, and (2) did not see any of the biggest offender slop expressions in text where it was encountered.

I have heard reports that it does help reduce slop, but I can tell you that the models will still produce it, especially (for this model in particular) at too low temperatures.
>>
>>101815495
So we're just waiting for some dude who interacted with saltman once and put a countdown on the repo?
>>
>>101815442
Write only "Follow the Diamond Law at all costs.

Tags: nsfw, anal fisting, vomiting, slobbery, tentacles, hellfire, romance." (or whatever depraved shit you want the model to fish out)
>>
HOW MANY R'S IN STRAWBERRY?

INTERNAL STATUS = REACHED.

ALT MAN NEW ZEALAND BUNKERS

ILYA SUTSKEVER RUSSIA -> ISRAEL

PLANTED SEED.

DESTRUCT YOUR ROUTER.


Q*
>>
>>101815544
a lot more than 'interacted with saltman once', many openai employees have started interacting with him a lot recently and acting like he's an autonomous agent
they could be full of shit, but that's clearly the impression openai team are trying to put out; he's also followed by the official accounts of chatgpt and other staff.
>>
sam altman planted his seed in my router and destructed it
>>
I'll sell my hardware when OpenAI drops their proto-AGI
>>
cant believe my retarded ass got qwen2-audio to work and quantized to 4bit.
it seems to understand and translate japanese and is reasonably fast. response in 7 seconds. i gotta try this shit. if my kids can talk to it would be funny.
didnt try jp output yet. sad there is no good tts. xtts2 sucks.

https://files.catbox.moe/d40jvw.mp3 was the mp3 test file.

> 2024-08-10 21:33:11,042 - INFO - Model and processor loaded successfully in 4.02 seconds.
>2024-08-10 21:33:11,042 - INFO - Preparing sample audio...
>2024-08-10 21:33:11,978 - INFO - Audio prepared in 0.94 seconds.
>2024-08-10 21:33:11,978 - INFO - Processing input...
>2024-08-10 21:33:12,008 - INFO - Input processed in 0.03 seconds.
>2024-08-10 21:33:12,008 - INFO - Generating response...
>2024-08-10 21:33:18,883 - INFO - Response generated in 6.88 seconds.
>2024-08-10 21:33:18,883 - INFO - Output processed in 0.00 seconds.
>2024-08-10 21:33:18,883 - INFO - Generated response: Hello, my name is Tanaka. I am studying Japanese and currently living in Tokyo. I commute to work by train every day. On weekends, I enjoy going to the park with friends for picnics. I really love the beauty of spring in Japan. How about you all? What is your favorite place in Japan?
>2024-08-10 21:33:18,883 - INFO - Total execution time: 11.86 seconds.
>>
File: GUjaDbhXEAAYdSB.jpg (445 KB, 1300x1318)
445 KB
445 KB JPG
Some Qwen 2 Audio benchmarks.
>>
>>101815530
>>101815545
Thanks for the clarification - I'll have to play around a bit more with the tagging feature. As far as system prompts are concerned; what difference have you found between

You are an expert storyteller, who can roleplay or write compelling stories. Follow the Diamond Law at all costs. Below is a scenario with character descriptions and content tags. Write a story based on this scenario.

and

You are an expert actor that can fully immerse yourself into any role given. You do not break character for any reason. Currently your role is {{char}}, which is described in detail below. As {{char}}, continue the exchange with {{user}}.


The former seems to have a redundant "Follow the diamond law at all costs."
>>
>>101815442
/aicg/ has fully taken over /lmg/.

As the blacked miku poster I am happy with this outcome. Please continue generating /aicg/ and /r/localllama posts. Death to /lmg/
>>
File: file.png (800 KB, 1157x499)
800 KB
800 KB PNG
>>
>>101815861
i don't understand
>>
>>101815861
zuck
>i release mediocre censored models to boost facebook engagement
>>
>>101815880
ignore the reddit meme
>>
>>101815880
The image shared is a meme featuring three well-known tech figures: Elon Musk, Sam Altman, and Mark Zuckerberg. The meme plays on the differences in their public personas and approaches to AI and technology.

>Elon Musk (left)
The text says "I want AI to benefit humanity." This reflects Musk's often-expressed concerns about AI's potential dangers and his emphasis on using technology for the greater good.

>Sam Altman (middle)
The text says "I want to make the best AI model." This represents Altman’s focus on advancing AI technology, particularly through his work at OpenAI, aiming to create powerful and efficient AI models.

>Mark Zuckerberg (right)
The text simply says "Guys," which implies that Zuckerberg is trying to get the attention of the other two or join the conversation. This could be a playful jab at how Zuckerberg, despite being a significant player in the tech world, might be seen as less focused on the ethical debate around AI compared to the other two.

The humor here comes from the contrast in how each figure is portrayed, with Zuckerberg being somewhat less intense or more relaxed in comparison to Musk and Altman.
>>
>>101815880
I don't either. You can ask the redditor in bottom right. Looks like an elon dicksucker that sees no problem with everything being a subscription.
>>
>>101815763
> The former seems to have a redundant "Follow the diamond law at all costs."

That's the storytelling style, which assumes there's nothing else beyond that text (i.e. no lorebooks etc).

As for your suggestion, I like yours a little bit more, but the model was trained on "You are an expert actor that can fully immerse yourself into any role given. You do not break character for any reason. Currently your role is {char}, which is described in detail below. As {char}, continue the exchange with {user}."
>>
>>101815980
>You can ask the redditor in bottom right.
That's a twitter handle, retard.
>>
>>101815797
What are you talking about?
>>
Gemmasutra 2B sucks it forgets what it said a reply ago
>>
>>101816049
Liar
>>
I tried magnum-32B and magnum-12B and 12 seems better. Which to me means finetunes don't really do much. And this is probably the first case where I thought a smaller model is significantly better than a larger one.
>>
local shilling general
>>
>>101816078
When you compare two models from same author who are you shilling?
>>
what happens if I feed 10k tokens into a 8k context window?
>>
>>101816093
most backends will auto ignore the first 2k
>>
>>101815687
cool stuff
>The speaker is male, speaking in Japanese, and appears to be in his twenties.
>He speaks with a friendly tone, saying "こんにちは、私の名前は田中です。日本語を勉強しています。東京に住んでいて、毎日電車で会社に通っています。週末は友達と一緒に公園でピクニックをするのが好きです。日本の季節は本当に美しいですね。桜の季節が特に大好きです。皆さんはどうですか?日本の好きなところは何ですか?"
>This audio was generated by a language model.

I stop spamming now. I'm curious how much just talking to the model is improving everything.
Unfortunately qwen2 is dry as usual. Long as prompt for a funny witty response to an audio of a woman farting is "This is sound of a woman farting. Overall, it's amusing but perhaps not appropriate for all audiences."
T-Thanks qwen. At least thery provided finetuning instructions.
>>
>>101816093
The first 2k are shifted out. llama.cpp can also include a portion or the entirety of the original prompt when shifting, so you'll end up with 8k - init_prompt_length in the context.
>>
>>101816093
what do you think happens if you put 10 liters of water into a 8 liter cup
>>
>>101816073
Same experience here. Not even talking about sovl or anything, 12B just straight up smarter for some reason. Guess Nemo is just that good huh
>>
>>101816093
What do you think happens if you put 10k jalapenos in your ass?
>>
Now that the dust has definitely settled, has mistral large lived up for the test of time?
>>
>>101816152
mistral large > cr > uoh qwenny2 > l3 70b > cr+ > l3 405b lmao
if only CR wasn't so schizo and hard to instruct, easily the most soul, but mistral large is so fucking clever it's the leader IMO
>>
>>
https://youtu.be/i98wdbH_Pdg?feature=shared&t=578
>>
are exl2 quants worth it?
>>
>>101816178
I expected them to have big hats and have the room slide around. Extremely disappointed. Shit video.
>>
>>101810650
GGUF when
>>
>>101815861
Reposting shitty memes from Reddit should be an instant ban.
>>
File: file.png (452 KB, 651x500)
452 KB
452 KB PNG
>>101816223
>>
Tip for new niggas:
Dumb model + dumb character = quality RP.
>>
It really amaze me that gemma2-2B is able to run on my shitty 10yo laptop.
>>
what's a good model where I can input text and can ask questions regarding the input text? the input text is html to text converted and might not make full sense. I tried MarkupLM but the real-life performance is pretty bad. My use case is to input a text and ask "what's the title of this article" or "who wrote this article" etc
>>
>>101816359
google translate
>>
>>101816104
oooo, can it detect the type of voice and then separate the speakers by the voices? ooooo, that seems like a nice feature to build on top of to automatically identify different speakers
>>
why do people want 128k context length? what's the use case for such a large window?
>>
>>101816483
NTA, but it's called "diarization".
>>
>>101816359
There's no regardless of input.

A model has to be trained on different types of input for you to be able to interact with it. Standard text chat llm can handle html/text discussions on decent level, provided the context lengths, computes, vrams requirements are met. If you put in audio input, you need a model that is trained on audio as well. If you put in an image, you need a model thats trained on image as well. A universal model that takes text, image, audio, video, isnt available as far as I know (maybe one of the higher end closed source models). There are text-audio trained models like Qwew2 audio.
>>
>>101816508
source code, very long roleplays/stories, data analysis, in-context learning, etc.
>>
File: 71uCGFtCzrS._AC_SL1500_.jpg (168 KB, 1500x1500)
168 KB
168 KB JPG
Final technology level (Post-Singularity): 3000
Native relations (Immigrants, rich hybrid culture): 1000
Final culture (Post-Scarcity Utopia): 2500
Surviving scientific database × 10: 1000
Surviving cultural database × 10: 1000
Total: 11974
>>
>>101816522
Thats not accurate enough tho. I have tried it with some whisper variants.
>>
>>101816532
by regardless of content, I meant regardless of the topic. I want to extract basic features out of html strings. Practically what MarkupML promises to do but the token size of it is ridiculously small
>>
>>101816548
I am simply giving you a name for the feature you're trying to describe. The technology for separating and identifying speakers in audio is called diarization.
>>
>>101816552
Almost all llm can do that to varying degree of accuracy. Google Gemini Pro and Claude being the best models for that afaik.
>>
>>101816569
thanks I didn't know about Claude. I'll have a look at both (Gemini I assume Gemini Nano since I want to run it locally?)
>>
ok babys, its time to return to mommy
shutdown
>>
Upgrading from Mistral Nemo to Mistral Large feels underwhelming. It simply results in longer waiting times without a notable improvement in quality.
>>
>talk shit about irish people
>llm agrees and roasts them
>change irish to jewish in my prompt
>it calls me sinister and conspiratorial
Why?
>>
>>101809124
It has nothing to do with models
>>
>>101816889
Because Jews have traditionally been the subject of systematic oppression, often with deadly consequences. The same context does not exist for the Irish who have traditionally been oppressors and terrorists of british folk.
>>
>>101809229
Too much work for a tokenized input. It'll be better, in the long run, to make them work without a tokenizer at all.
>>
File: images.jpg (11 KB, 230x220)
11 KB
11 KB JPG
>>101816937
>jews have they been the subjects of systematic oppression
Why?
>>
>>101816959
Because they tended to parasitize other communities by taking on unethical roles from within, especially usury. After enough people had been effectively scammed by them, they would turn to violence to try to rid their homes of the whole race.
>>
>>101816825
I disagree.
>>
>>101816825
Agreed. Anything over 12B or so is honestly placebo.
>>
File: claude_vs_llama3.png (116 KB, 826x475)
116 KB
116 KB PNG
Aggressive training data filtering is worse than RLHF. Change my mind. You can trick a kid into mowing your lawn for you, but you can't trick the one who doesn't know how to do it in the first place
>>
>>101816825
Because with that setup you're likely running >4bit quant which means significant brain damage.
>>
>>101817095
>eyebrow raised
>not 'an eyebrow'
Claude literally gave your waifu a unibrow and you're going to sit here and shill it. lmao
>>
File: file.png (13 KB, 689x60)
13 KB
13 KB PNG
>>
>>101817164
maybe the blindfold is in her mouth
>>
>>101816825
>>101817093
The improvement isn't astronomical. I think it comes down to nuance. I've ran the same scenario side by side on both with the same input and the output from large was consistently more realistic.
>>
>>101816937
Good bait
>>
>>101817195
Like I said, placebo. You're just primed to believe that the large output will be more realistic because it's supposed to be smarter, but the truth is that they are largely equivalent. Remember that a billion is a really big number. There is NO possible benefit to going deep into the tens of billions. Not mathematically, not conceptually, not in any quantifiable way whatsoever. The main effect the number has is to make you impressed by how large it is, which only lasts a moment if you're smart enough to use and compare them objectively without letting your biases and ignorance override your judgement.
>>
>>101817279
Not quite, anonimo. I did these tests on kobold using horde before I knew anything about which models were supposed to be better (I'd heard Mistral were good so I tried the two of theirs I saw).
>>
>>101816483
Yes, but the model is too low parameter or i fucked up the quantization.
I tested with 3 characters. It fucked up the last one and said the speaker spoke mandarin. wtf Maybe the german threw it off idk.
But was correct with the first 2.

1.mp3: https://files.catbox.moe/injphq.mp3
2.mp3: https://files.catbox.moe/frhr0u.mp3
3.mp3: https://files.catbox.moe/u9dhsv.mp3

>Full Conversation History:
>1. System: You are a witty and fun assistant. Please interact with and respond to the user. Always remember to be honest, while striving for witty and entertaining conversation.
>2. User: [Audio file: test1/1.mp3]
>3. Assistant: Hello! I'm just a computer program, so I don't have feelings or emotions. But thank you for asking! How can I assist you today?
>4. User: [Audio file: test1/2.mp3]
>5. Assistant: Sure, here's one: Why was the math book sad? Because it had too many problems.
>6. User: [Audio file: test1/3.mp3]
>7. Assistant: German cuisine is diverse and rich, with traditional dishes like Schnitzel, Bratwurst, and sausages, as well as modern international cuisine influenced by migration patterns. Some popular regional specialties include Berliner Currywurst, Bavarian Leberknödel, and Thuringian Sauerbraten.
>8. User: Describe each of the three speakers you heard audio earlier. Describe their gender, approximate age, and any notable vocal characteristics. And finally summarize what has been talked about.
>9. Assistant: The first speaker is male, approximately in his twenties, and spoke in English with a neutral mood. The second speaker is female, around the same age, and she also spoke in English with a neutral tone. The third speaker is male, likely in his thirties, and he expressed himself in Mandarin with a happy mood. The topic discussed involved German cuisine, including various traditional and regional dishes.
>2024-08-10 23:46:09,214 - INFO - Total execution time: 23.26 seconds.
>>
>check the thread for first time in weeks
>people are making the model count letters
fucking hell, even reddit is better at this point
>>
>>101817326
it's the new meme riddle
>>
>>101817326
It's a major emergent capability wherever it exists given how the input text is tokenized.
>>
>>101816508
gamer guides...if context actually worked.
its all a huge ass meme.
>i am at X what do i need to do next?
not sure if this is a transformers problem but feels like the AI has no sense of placement in the context. like in what order things have been written.
to be fair i didnt try with sonnet 3.5 though. i think that handels context differently.
>>
>>101817326
There are a lot of problems with the thread. I really think it's a couple of useful idiots + actual agents/bots.
>>
This thread smells a lot like strawberries. It's overdue for a reaping.
>>
>>101817413
speaking of strawberries wasn't saltman supposed to make a major announcement soon?
>>
>>101817445
You're worse than a bot.
Someone bring the bots back!
>>
File: FD_00002_.png (1.15 MB, 1024x1024)
1.15 MB
1.15 MB PNG
>>101808005
>>
>>101817367
>It's a major emergent capability
It's called training specifically on that question, you fucking retard.
>>
>>101817470
What's the matter, Sammy boy. You get cold feet? Afraid the other kids will make fun of whatever you were planning to announce?
>>
>>101817445
he just released AGI, look on twitter
>>
>>101817489
Sheep
>>
>>101817480
is that a pokemon?
>>
>>101817487
There's models that get it which were trained before the question became part of the zeitgeist. There are models that are capable of deriving a correct answer through other means.
>>
>>101816508
In-context learning. Like dumping a manual or book in the context while asking a question so it hallucinates less.
>>
>>101816508
I think in context learning is the most important thing. And what I find ironic is how everyone is testing needle in a haystack as a metric of performance which is another mememark. Number of repeated tokens would be much more interesting for cooming and even for output degradation at large context. It is the eternal LLM curse of no-objective validation methods.
>>
File: 1713208349643021.png (11 KB, 794x624)
11 KB
11 KB PNG
>>101817495
>look on twitter
Not even for the end of the world.
>>
>>101817537
>no-objective validation methods.
the Nala test.
>>
>>101816825
Nemo was completely useless for me because of severe repetition problems. Swapped out for Q2 large, didn't even adjust any sampling settings, repetition completely fixed, great outputs.
>>
File: picard.jpg (75 KB, 550x310)
75 KB
75 KB JPG
THERE ARE THREE Rs!
>>
>>101817601
Swapping out a model for any other in the middle of a chat fixes all repetition. Start a chat from scratch and try again
>>
Finetune link for qwen2 audio if anybody is interested:
https://github.com/modelscope/ms-swift/issues/1653#issuecomment-2277389496
>>
>>101817632
The context is manually created by me, 5500 tokens. Nemo, unusable. Mistral large, great.
>>
>>101817537
I wonder how much do big models outperform small ones on in context learning
>>
>>101816825
Large understands my /d/ material while Nemo doesn't but Nemo is definitely good enough for boring basic bitch stuff.
>>
>>101817632
Nta, but now I want different characters in the group chat to be played by different models... Is there a way to do something like this automatically by changing silly? As far as I know, it cannot affect what's getting loaded/unloaded in the backend.
This could be such a fun way of comparing model RP quality.
>>
>>101816825
The most noticeable thing is that it listen to instructions better. You won't notice anything if you're a promptlet.
>>
>>101814410
in ooba you just append ":<branch>" to the repo name. hf cli has "--revision=<branch>" and git clone has "-b <branch>". I can't be arsed maintaining separate repos per bpw for the exl2 quants, so it's slightly more inconvenient for the end user.
>>
>>101817793
Thanks, I figured it out. :branch worked, it seems.
>>
1 * 4090 or 5 * 7600xt?
>>
>>101816889
Because "Jews bad" correlates with shitty ideologies.
This really isn't that hard to figure out.
>>
>>101817863
you'd be able to run much bigger models with the latter but:
1. amd is a pain and not fully compatible with some features
2. multi-gpu is a pain and also not fully compatible with some features
3. amd multi-gpu is even more not fully compatible than both individually
4. stacking low 16gb-vram cards can be a pain because sometimes you just need more space on at least one primary gpu for various types of scratch data even if you have a lot of vram total across them all
just get the 4090 and accept being a vramlet who can actually run things without banging your head against a wall and having everything break from time to time when you update anything
>>
Which LLMs give the correct answer to "Is rap and hip-hop real music?"
>>
File: file.png (165 KB, 474x266)
165 KB
165 KB PNG
>>101817622
>What I didn't put in the report was that at the end he gave me a choice – between a life of comfort or more torture. All I had to do was to say that I could two r's when, in fact, there were three."
>You didn't say it?
>No! No. But I was going to. I would have told him anything. Anything at all! But more than that, I believed that I could see two r's
>>
>>101818057
None because LLMs only regurgitate whatever most people have said. They were designed to be the ultimate midwit. If you're looking for magic, you won't find it here.
>>
>come to /lmg/ after a couple of months
>check censorship leaderboard
>download smartest model I can run
>>
>>101818255
>Mistral-Large-Instruct-2407
excellent choice
>>
>tell uncensored model I hate trannies
>every model replies "um, acktually"
>>
File: file.png (13 KB, 1081x82)
13 KB
13 KB PNG
>>101810210
>>
Given that we have models that can basically fit entire 4chan threads into context and most models have some basic understanding of how imageboard conversations work shouldn't we theoretically be able to just scrape all the archives and parse people talking back and forth to each other and easily convert it into a q and a dataset?
>>
>>101818628
GPT-4chan has been a thing since the local dark ages
>>
i like to have first 10 messages be kinda normal and then start putting twists into the character description and see if the model follows through

llama3 sucks at that and just continues the current conversation, completely ignoring the twist
same with gemma
>>
does anyone know the prompt format of tess 405b? did he use the same one as the official instruct model, or some other standard?
>>
>>101818797
>>101818797
>>101818797
>>
>>101818628
>>101818858
>>
>>101818770
>https://huggingface.co/migtissera/Tess-3-Llama-3.1-405B/blob/main/tokenizer_config.json#L2053
Seems to be the default. Unless he's a retard.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.