Clustering fun with NetWare

It's been one of those weeks.

To top it off, I ran into an apparent communications issue last night on a 2-node NetWare 6.5 cluster. The machines (a matched pair of Proliant DL380 G4's) were both configured for load balancing and fault tolerance on their NICs. For some reason, without any hardware, software, or infrastructure change, they started missing ticks, and alternately casting themselves out of the cluster.

The primary NIC in each system reported random media connection failures, though the switches (all three; I was working remotely, so I'm not sure which NIC was connected to which switch) appear to be fine. Still, the problem went away when I disabled the primary NIC in each box. The cables were good, too, all relatively new CAT-6's. All NICs were configured for autosense, and detected gigabit switch connections and went to full duplex.

The switches are all LinkSys SR2024 24-port, unmanaged 10/100/1000 units of varying vintage.

I'll post more when I have a better idea of what was really going on. Upon (re)enabling the primary NIC in one server, I was greeted by a repeating (incrementing) message on the logger screen:

NWCLSTR_Node_Tick: got called with invalid node number nnnnnnn

(where nnnnnnn kept counting up).

More research to do...

