While working on a Gluster project, I ran into an issue adding in a third node to an existing cluster running on Oracle Linux 8. This was very frustrating, as the cluster was working fine. I went ahead and built a new cluster in my home lab, to see if I could replicate the issue.
Even in the homelab, I was unable to add nodes to the cluster! The gluster peer probe failed!
[root@gluster1 ~]# gluster peer probe gluster3
peer probe: failed: gluster3 is either already part of another cluster or having volumes configured
After encountering issues adding nodes in two different
environments, I went through some troubleshooting steps including checking
selinux, firewall ports, and completely rebuilding the third node. No joy!
I started digging through log files… and saw this error in /var/log/glusterfs/glusterd.log
every time I tried to add the node…
[2023-09-04 17:22:30.130617] E [socket.c:2253:__socket_read_frag] 0-rpc: wrong MSG-TYPE (1728250847) received from 192.168.200.110:49151
[2023-09-04 17:22:38.710707] E [socket.c:2253:__socket_read_frag] 0-rpc: wrong MSG-TYPE (1728250739) received from 192.168.200.110:49151
[2023-09-04 17:22:39.433087] E [socket.c:2253:__socket_read_frag] 0-rpc: wrong MSG-TYPE (1728250628) received from 192.168.200.110:49151
Hmm… odd error. What could cause this? Wrong message type…bad
packet…
Then it hit me like a brick the next day! I encrypted the
cluster, and the 3rd node didn’t have it’s pem file added to the
list of CAs! So I quickly appended the
.pem file from gluster3 to /etc/ssl/glusterfs.ca on all nodes of the cluster. And
BAM! It worked!
[root@gluster1 ~]# gluster peer probe gluster3
peer probe: success
So, why? What happened?
In all my clusters I enable encryption from day one. This
encrypts the traffic between the cluster nodes, adding security, especially
when running clusters in the cloud where you never know who might be listening
to your network traffic. This worked great when the clusters were built, but
when adding a node it’s traffic wasn’t being decrypted correctly because the
gluster1 node was sending an encrypted packet to gluster3, but gluster3 could
not decrypt the packet. This was because gluster3 didn’t have the pem file (
the pem file stores cryptographic keys
and certificate authorities) from gluster1 or gluster2. This means that glusterd
could not decrypt the packet, so it reported it as a bad packet.
Once I added the pem file to all the nodes, every node could
now decrypt the messages. Putting everything back to normal!
As a note to the gluster developers, PLEASE add in better
error handling. an error message in glusterd.log that said it was a key error
would have been very helpful!
And about the small files or PHP app’s on cluster? Does the read keep slow yet?
Not sure what you mean by “read keep slow”? If you are talking about I/O, it’s more about the underlying storage that glusterFS.