/g/ - Llamafile - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
Llamafile 11/04/25(Tue)16:31:53 No.107106564

File: 1731429531133813.png (795 KB, 1323x813)

795 KB PNG

Llamafile Anonymous 11/04/25(Tue)16:31:53 No.107106564

it just werks

Anonymous
11/04/25(Tue)16:38:17 No.107106628

Anonymous 11/04/25(Tue)16:38:17 No.107106628

>>107106564
>can't have files over 4GB
straight into the trash

Anonymous
11/04/25(Tue)16:46:58 No.107106699

Anonymous 11/04/25(Tue)16:46:58 No.107106699

>>107106628
thats because windows fucking sucks.

Anonymous
11/04/25(Tue)16:52:57 No.107106746

Anonymous 11/04/25(Tue)16:52:57 No.107106746

File: 1745104921963826.png (488 KB, 1839x776)

488 KB PNG

>>107106628
>>107106699
idgi, do these not work?

Anonymous
11/04/25(Tue)16:54:22 No.107106755

Anonymous 11/04/25(Tue)16:54:22 No.107106755

>>107106699
Who is still using Fat32. Are you retarded?

Anonymous
11/04/25(Tue)16:56:04 No.107106768

Anonymous 11/04/25(Tue)16:56:04 No.107106768

File: file.png (100 KB, 1233x144)

100 KB PNG

>>107106746
windows has a filesize limit for executables.
>>107106755
retard

Anonymous
11/04/25(Tue)17:03:55 No.107106832

Anonymous 11/04/25(Tue)17:03:55 No.107106832

>>107106564
>troonware
no thanks, still using the superior llama.cpp, thanks

Anonymous
11/04/25(Tue)17:04:35 No.107106840

Anonymous 11/04/25(Tue)17:04:35 No.107106840

File: 1745993400598726.png (447 KB, 1502x558)

447 KB PNG

>>107106768
kek imagine using winshit, what the fuck are we still in x86_32 days?
and yea apparently you gotta split it into 2 files

Anonymous
11/04/25(Tue)17:11:30 No.107106904

Anonymous 11/04/25(Tue)17:11:30 No.107106904

for me its llamacpp server in docker anyone on lan can access through its http port

Anonymous
11/04/25(Tue)18:50:06 No.107107844

Anonymous 11/04/25(Tue)18:50:06 No.107107844

>>107106904
too bloated if you just need a small fast model to run on your code editor. llamafile is simple

Anonymous
11/04/25(Tue)22:50:59 No.107109379

Anonymous 11/04/25(Tue)22:50:59 No.107109379

>>107106564
Bump. Good morning sir!

Anonymous
11/05/25(Wed)05:03:24 No.107110830

Anonymous 11/05/25(Wed)05:03:24 No.107110830

>>107109379
good morning

Anonymous
11/05/25(Wed)06:29:09 No.107111272

Anonymous 11/05/25(Wed)06:29:09 No.107111272

>>107106699
why would someone use Wiindows?

Anonymous
11/05/25(Wed)06:31:47 No.107111287

Anonymous 11/05/25(Wed)06:31:47 No.107111287

>>107111272
i had to in a workplace several times on an airgapped network. if llamafiles existed it would have saved me a fuckton of time

Anonymous
11/05/25(Wed)11:13:32 No.107113200

Anonymous 11/05/25(Wed)11:13:32 No.107113200

>>107111272
why are you a retard?

Anonymous
11/05/25(Wed)12:22:37 No.107113797

Anonymous 11/05/25(Wed)12:22:37 No.107113797

>>107106699
>runs llms on windows
just use online services if you don't give a fuck

Anonymous
11/05/25(Wed)12:58:19 No.107114088

Anonymous 11/05/25(Wed)12:58:19 No.107114088

>>107106564
coded by a tranny that tried to sabotage the upstream project
search "justine" tunney for more

Anonymous
11/05/25(Wed)13:48:15 No.107114504

Anonymous 11/05/25(Wed)13:48:15 No.107114504

>>107106564
> just werks
> built in sandboxing on linux
> reasonable token production
If you know someone who isn't computer literate, but wants to try local LLMs, this is a good solution. Lots of options. I suggest
> Mistral 7B Instruct v0.3.Q4 (fast general purpose)
> Google gemma 3 12B it Q4 K M (general purpose
> Gemma 2 27B it Q6K (slow general purpose)
> Qwen2.5.1 Coder 7B Instruct Q8 (fast coding helper)
> Qwen2.5 Coder 14B Q6K (slow coding helper)

Anonymous
11/05/25(Wed)13:51:47 No.107114547

Anonymous 11/05/25(Wed)13:51:47 No.107114547

>>107114504
Oh, I almost forgot. You can load gguf's with this, too.
> ./Mistral-7B-Instruct-v03.!4_0.llamafile -m <your-gguf-model-here>

Anonymous
11/05/25(Wed)14:09:03 No.107114705

Anonymous 11/05/25(Wed)14:09:03 No.107114705

>yet another ai chatbot
holy yawn

Anonymous
11/05/25(Wed)14:31:22 No.107114919

Anonymous 11/05/25(Wed)14:31:22 No.107114919

>>107114504
im trying to read up on LLM and quants
q4 seems bad, i dont quite get difference of q6k vs q8
how can u do 6bit?^
just how big is the difference of a q6k 7b vs 13b
it seems q6 is the sweetspot and K M suffixes seem to imply better trained

Anonymous
11/05/25(Wed)15:38:41 No.107115623

Anonymous 11/05/25(Wed)15:38:41 No.107115623

>>107114919
>how can u do 6bit?
you pack the next 2 bits in the empty part

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.

Janitor application acceptance emails are being sent out. Please remember to check your spam box!