I’ve started encountering a problem that I should use some assistance troubleshooting. I’ve got a Proxmox system that hosts, primarily, my Opnsense router. I’ve had this specific setup for about a year.

Recently, I’ve been experiencing sluggishness and noticed that the IO wait is through the roof. Rebooting the Opnsense VM, which normally only takes a few minutes is now taking upwards of 15-20. The entire time my IO wait sits between 50-80%.

The system has 1 disk in it that is formatted ZFS. I’ve checked dmesg, and the syslog for indications of disk errors (this feels like a failing disk) and found none. I also checked the smart statistics and they all “PASSED”.

Any pointers would be appreciated.

Example of my most recent host reboot.

Edit: I believe I’ve found the root cause of the change in performance and it was a bit of shooting myself in the foot. I’ve been experimenting with different tools for log collection and the most recent one is a SIEM tool called Wazuh. I didn’t realize that upon reboot it runs an integrity check that generates a ton of disk I/O. So when I rebooted this proxmox server, that integrity check was running on proxmox, my pihole, and (I think) opnsense concurrently. All against a single consumer grade HDD.

Thanks to everyone who responded. I really appreciate all the performance tuning guidance. I’ve also made the following changes:

  1. Added a 2nd drive (I have several of these lying around, don’t ask) converting the zfs pool into a mirror. This gives me both redundancy and should improve read performance.
  2. Configured a 2nd storage target on the same zpool with compression enabled and a 64k block size in proxmox. I then migrated the 2 VMs to that storage.
  3. Since I’m collecting logs in Wazuh I set Opnsense to use ram disks for /tmp and /var/log.

Rebooted Opensense and it was back up in 1:42 min.

  • I’m starting to lean towards this being an I/O issue but I haven’t figure out what or why yet. I don’t often make changes to this environment since it’s running my Opnsens router.

    root@proxmox-02:~# zpool status
      pool: rpool
     state: ONLINE
    status: Some supported and requested features are not enabled on the pool.
            The pool can still be used, but some features are unavailable.
    action: Enable all features using 'zpool upgrade'. Once this is done,
            the pool may no longer be accessible by software that does not support
            the features. See zpool-features(7) for details.
      scan: scrub repaired 0B in 00:56:10 with 0 errors on Sun Apr 28 17:24:59 2024
    config:
    
            NAME                                    STATE     READ WRITE CKSUM
            rpool                                   ONLINE       0     0     0
              ata-ST500LM021-1KJ152_W62HRJ1A-part3  ONLINE       0     0     0
    
    errors: No known data errors
    
    • Pyrosis@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      6 months ago

      It looks like you could also do a zpool upgrade. This will just upgrade your legacy pools to the newer zfs version. That command is fairly simple to run from terminal if you are already examining the pool.

      Edit

      Btw if you have ran pve updates it may be expecting some newer zfs flags for your pool. A pool upgrade may resolve the issue enabling the new features.

        • Pyrosis@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          6 months ago

          Upgrading a ZFS pool itself shouldn’t make a system unbootable even if an rpool (root pool) exists on it.

          That could only happen if the upgrade took a shit during a power outage or something like that. The upgrade itself usually only takes a few seconds from the command line.

          If it makes you feel better I upgraded mine with an rpool on it and it was painless. I do have a everything backed up tho so I rarely worry. However ai understand being hesitant.

          • I’m referring to this.

            … using grub to directly boot from ZFS - such setups are in general not safe to run zpool upgrade on!

            $ sudo proxmox-boot-tool status
            Re-executing '/usr/sbin/proxmox-boot-tool' in new private mount namespace..
            System currently booted with legacy bios
            8357-FBD5 is configured with: grub (versions: 6.5.11-7-pve, 6.5.13-5-pve, 6.8.4-2-pve)
            

            Unless I’m misunderstanding the guidance.