What's the deal with LlamaCPP and caching?

j4k3@lemmy.world · 7 months ago

What's the deal with LlamaCPP and caching?

rufus@discuss.tchncs.de · 7 months ago

You probably just have different settings (temperature, repetition_penalty, top_x, min/max_p, mirostat …) than what you had with python. And those settings seem way better. You could check and compare the model settings.

webghost0101@sopuli.xyz · 7 months ago

What hardware do you have to run 70B and how long does generating take?

j4k3@lemmy.world · 7 months ago

Just a laptop with 12th gen i7, 16gb 3080Ti, and 64gb of DDR5 system memory.

webghost0101@sopuli.xyz · 7 months ago

Thats a juicy amount of memmory for just a laptop.

Interesting, the fosai site made it appear like 70B models are near impossible to run requiring 40B gb of VRam but i suppose it can work with less But slower.

The vram of your gpu seems to be the biggest factor. A reason why while my current gpu is dying i cant get myself to spend on a mere 12 gb 4070ti