Back to blog
Feb 01, 2025
9 min read

Server Management with Proxmox: From a Single Node to High Availability (HA)

Proxmox enabled me to transition from a single-node setup to a high-availability cluster, ensuring data redundancy, improved service continuity, and simplified management with ZFS storage.

Server Management with Proxmox: From a Single Node to High Availability (HA)

Introduction

In the age of virtualization, efficiently managing hardware resources and ensuring continuous service availability is essential. Proxmox has been my platform of choice for managing virtual machines (VMs) and containers in my infrastructure. In this blog post, I’ll share my experience with Proxmox, how I transformed my homelab from a single node to a high availability (HA) setup, and explain the key decisions I made regarding storage, data replication, and cluster configuration.

What is Proxmox?

Proxmox is an open-source virtualization platform that offers a powerful suite of tools to manage virtual machines and containers. Based on technologies like KVM (for virtual machines) and LXC (for containers), Proxmox also integrates advanced storage and networking solutions. One of the platform’s standout features is its user-friendly web interface, which allows system administrators to manage multiple nodes and resources without the need for complicated or expensive proprietary software solutions.

Why did I choose Proxmox?

Proxmox allowed me to consolidate my infrastructure efficiently. With its ability to handle both virtual machines and containers on a single platform, I was able to reduce complexity and centralize the management of my environment. Additionally, Proxmox integrates with advanced technologies like ZFS and Ceph, which was crucial for my goal of improving data availability and redundancy.

My Journey from a Single Node to a Proxmox Cluster

My homelab started with a single Proxmox node, which allowed me to virtualize several machines for different tasks like web servers, databases and cloud storage. This approach, while simple and effective, didn’t provide redundancy in the data or high availability. If the hardware of that node failed, my entire environment would be impacted with potential data loss and downtime.

The Need for High Availability (HA)

To address this challenge, I decided to expand my homelab and move to a high availability configuration. The key to this change was adding a second node to my Proxmox cluster, so that if one node failed, the other could automatically take over without manual intervention. This transition also allowed me to explore more advanced and efficient storage options like ZFS, which provided data replication and integrity features that were essential for a high availability environment.

Installing the Second Node

Installing a second node in Proxmox was relatively straightforward thanks to the platform’s robust setup following these simple steps:

  1. Installing Proxmox on the new node: I began by installing Proxmox on the second server. The installation process is direct and well-documented, which made the initial setup quick and easy. I realy recommend installing it in a NVMe SSD for better performance or at least in a SATA SSD, but if you are on a budget, you can install it in a HDD with the downside of slower performance on boot and some operations.

  2. Connecting to the existing cluster: Once Proxmox was installed on both nodes, I connected the new node to the existing cluster. This was done easily through Proxmox’s web interface, where you can add a node to an already configured cluster by simply entering the appropriate credentials. If you don’t have a cluster yet, you can create one by following the steps in the Proxmox documentation.

  3. Network and shared storage configuration: Both nodes were set up to share resources and storage, which was crucial for data replication and high availability. This step also involved configuring the network to ensure smooth communication between the two nodes. For the shared storage, is recommended to have a good network setup with at least 1Gbps connection between the nodes, and if you can, use a 10Gbps or better connection for better performance.

With these simple steps, my infrastructure went from a single node to a two-node cluster, ready to handle a high availability environment.

Proxmox Storage Options: LVM, LVM Thin, Directory, and Ceph

Once the cluster was configured, I faced the decision of choosing the right storage options. Proxmox offers several storage solutions, each with its own features and benefits depending on the needs of the infrastructure. Below, I explain the main options and how I selected the one best suited for my situation:

  1. LVM (Logical Volume Manager): LVM is a technology that allows you to manage logical volumes in a flexible manner. With LVM, you can create, resize, and delete disk volumes easily without losing data. While LVM is a reliable and straightforward option, it doesn’t offer advanced features like data deduplication or replication.

  2. LVM Thin: LVM Thin is a more efficient variant of LVM. It allows you to create logical volumes in a more optimized way, reducing space wastage. This option is ideal when you need to create many volumes without occupying too much physical space on disk, but it still lacks some of the advanced features I was looking for in a high-availability setup.

  3. Directory: Directory storage refers to using a directory in the file system to store VMs or containers. It’s a simple and effective solution if advanced features aren’t required. However, it doesn’t offer the redundancy or fault protection that other storage systems provide.

  4. ZFS: ZFS is a powerful file system that offers advanced features like data replication, compression, and deduplication. ZFS is an excellent choice for high availability environments, as it ensures data integrity and redundancy with RAID-Z configurations. ZFS was the option I chose for my infrastructure due to its robust features and reliability.

  5. Ceph: Ceph is a distributed storage solution designed for high availability and scalability. It’s an excellent choice for large infrastructures, but it can be complex to implement and maintain, especially with a limited number of nodes. Given that my infrastructure only had two nodes, Ceph wasn’t the best option due to its complexity and resource requirements.

Why I Chose ZFS with Replication

The solution I ultimately chose for my storage was ZFS, a robust and advanced file system that provides several key features that made it ideal for my infrastructure. ZFS offers data replication capabilities, which is crucial in a high availability environment.

Benefits of ZFS in My Setup:
  1. Replication: ZFS allows for automatic and efficient replication of data between nodes. This replication is essential to ensure that in the event of a node failure, the other node has an exact copy of all data, minimizing downtime. Dispite replication must be configured like a cron job, it’s a simple and effective way to ensure data redundancy, but with the downside of not being real-time replication. If real-time replication is not a requirement, ZFS replication is a great option and if you want lower time between replications, you can configure it to run more frequently. For instant replication, you can use Ceph, but as I mentioned before, it’s more complex to implement and maintain.

  2. Data integrity: ZFS includes advanced data integrity verification mechanisms, protecting against file corruption and ensuring that the stored data is always consistent.

  3. Compression and deduplication: ZFS also offers features like compression and deduplication, which significantly reduce storage space usage, improving overall efficiency.

  4. Ease of management: Despite its power, ZFS is relatively easy to configure and manage in Proxmox. Thanks to its native integration with Proxmox, I was able to quickly set up replication between nodes without extra complexity.

Why Not Ceph?

While Ceph offers an excellent distributed storage solution, I chose ZFS to simplify management and ensure that my infrastructure didn’t become overloaded with complex tasks. ZFS replication was more than sufficient to meet my redundancy needs with only two nodes.

High Availability (HA) in Proxmox: Ensuring Service Continuity

High availability (HA) is a crucial concept for infrastructures that require continuous operation, even in the event of failures. In Proxmox, HA is implemented through a cluster of nodes that work together to ensure that if one node fails, virtual machines or containers are automatically migrated to the available node.

How HA Works in Proxmox

Proxmox uses an HA manager to continuously monitor the status of the nodes and virtual machines. If a node fails or becomes disconnected, the HA manager automatically migrates the virtual machines from the failed node to the remaining nodes, ensuring service continuity.

The migration process is not instantaneous, but it’s fast enough to minimize downtime to 1-2 minutes and ensure that services remain available. The HA manager also monitors the status of the virtual machines and can restart them on a different node if necessary.

Implementing HA with Two Nodes and a “Tie-Breaker”

Since my cluster only has two nodes, I implemented a tie-breaker node using a Raspberry Pi Zero. This small device acts as an arbitrator in the cluster to prevent split-brain, a situation where both nodes believe they are the leader of the cluster, potentially leading to data corruption. With the tie-breaker, Proxmox can quickly make decisions about which node should remain active in case of a failure.

Conclusion

The transition from a single node to a high availability cluster in Proxmox was a key decision to ensure the continuity of my services and protect data in the event of failures. Thanks to Proxmox’s powerful features like ZFS replication, high availability, and easy storage management, my infrastructure is now more reliable and efficient. If you’re considering a similar setup, I highly recommend exploring Proxmox, which offers great flexibility and scalability to meet the needs of your environment.