• 0 Posts
  • 89 Comments
Joined 7 months ago
cake
Cake day: September 9th, 2025

help-circle






  • melfie@lemy.loltomemes@lemmy.worldFull circle
    link
    fedilink
    arrow-up
    7
    ·
    17 days ago

    Just finished watching The Dinosaurs series, narrated by Morgan Freeman. I enjoyed the series overall, though I do find it difficult to suspend my disbelief and stop wondering what shit they completely made up, what has a firm scientific basis, and the extent to which the current understanding will be laughable in 20 years.








  • melfie@lemy.loltoSelfhosted@lemmy.worldServer ROI Calculator
    link
    fedilink
    English
    arrow-up
    7
    ·
    2 months ago

    I asked a friend who owns a small plane whether it saves money vs. flying commercial, and the answer was no way, it’s for the freedom, convenience, and love of aviation. Realistically, self-hosting is the same, albeit a lot cheaper.

    That being said, I wouldn’t mind having a runway in my back yard with a steel building to store a small plane, although the stakes are lower if you forget to maintain your server.



  • melfie@lemy.loltoSelfhosted@lemmy.worldSelfhosted coding assistant?
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    2 months ago

    The main thing that has stopped me from running models like this so far is VRAM. My server has a RTX 4060 with 8GB, and not sure that can reasonably run a model like this.

    Edit:

    This calculator seems pretty useful: https://apxml.com/tools/vram-calculator

    According to this, I can run Qwen3 14B with 4B quant and 15-20% CPU/NVMe offloading and get 41 tokens / s. It seems 4B quant reduces accuracy by 5-15%.

    The calculator even says I can run the flagship model with 100% NVMe offloading and get 4 tokens / s.

    I didn’t realize NVMe offloading was even a thing and not sure if it actually is supported or works well in practice. If so, it’s a game changer.

    Edit:

    The llama.cpp docs do mention that models are memory mapped by default and loaded into memory as needed. Not sure if that means that a MoE model like qwen3 235b can run with 8GB of VRAM and 16GB of RAM, albeit at a speed that is an order of magnitude slower like the calculator suggests is possible.