• The Hobbyist@lemmy.zip
    link
    fedilink
    English
    arrow-up
    4
    ·
    edit-2
    3 hours ago

    Ollama, latest version. I have it setup with Open-WebUI (though that shouldn’t matter). The 14B is around 9GB, which easily fits in the 12GB.

    I’m repeating the 28 t/s from memory, but even if I’m wrong it’s easily above 20.

    Specifically, I’m running this model: https://ollama.com/library/deepseek-r1:14b-qwen-distill-q4_K_M

    Edit: I confirmed I do get 27.9 t/s, using default ollama settings.

    • Viri4thus@feddit.org
      link
      fedilink
      English
      arrow-up
      2
      ·
      2 hours ago

      Ty. I’ll try ollama with the Q-4-M quantization. I wouldn’t expect to see a difference between ollama and SGlang.

    • Jeena@piefed.jeena.net
      link
      fedilink
      English
      arrow-up
      2
      ·
      8 hours ago

      Thanks for the additional information, that helped me to decide to get the 3060 12G instead of the 4060 8G. They have almost the same price but from what I gather when it comes to my use cases the 3060 12G seems to fit better even though it is a generation older. The memory bus is wider and it has more VRAM. Both video editing and the smaller LLMs should be working well enough.