jump to navigation

UPS Monitoring Matrix 9 April 2007

Posted by Mark in VMware Server.
1 comment so far

I spent part of my Sunday reading through the nut and apcupsd manuals to get more familiar with these packages. All my research is theoretical at this stage; I just needed to make a decision about which UPS monitoring package to invest my time in this week.  Both of these packages have a good executive overview of power management configuration.  As I suspected in my comment to Saturday’s post, these are useful reading to plan your power management conceptually before getting tangled in the quirks of rpm -i‘s or make install‘s.

I realized that I better make sure I have a client machine running the monitor in my rack that can control power management. It won’t do to have the UPS beeping at me and all of my desktop machines dead because their lightweight batteries caused a shutdown before the rack UPS ran out of juice.  And I also came to the realization that I would have to get the courage up to do a simulated shutdown at some stage before one happened at a less opportune time.

Here is a matrix comparing the three UPS monitoring software packages on the key points I needed:

  PowerChute apcupsd nut
All components can run on Linux No Yes Yes
Requires Linux GUI N/A No No
LogWatch monitoring ? 1 2
Big Brother monitoring (deadcat) Yes Yes Yes
Package maintenance (yum) No No No3
Shutdown for these slave OSes
Linux Yes Yes Yes
Solaris Yes Yes Yes
Windows Yes Yes Yes
  1. “The apcupsd philosophy is that all logging should be done through the syslog facility”.
  2. Looks trivial to set up with upslog client.
  3. The only official releases from this project are source code.

In conclusion, I am going to put my money on the nut horse.   I gave up on PowerChute early on since it gave me the impression it was not seriously addressing the Unix world (and I wouldn’t be able to do much to change that).  Nut looks like it has a cleaner design than apcupsd, though it may lag behind somewhat in implementation.  I hope I am not disappointed in WinNUT as a simple client to shutdown virtual Windows PCs, but I will at least give it a try.

Advertisements

APC PowerChute or Network UPS Tools? 7 April 2007

Posted by Mark in VMware Server.
3 comments

I spent most of yesterday struggling with APC PowerChute Business Edition (PBE). I downloaded the free 7.0.5 version from APC’s website (Linux and Windows–registration probably required), plugged in the USB cable provided with my 2U SmartUPS 1500, and started with the install.

The PBE install requires an agent to run on each computer connected to the UPS. These agents contact a server (only available for Windows platforms), which amalgamates information from several UPSes and communicates it to a PBE client program (also Windows) for management. I was about to qualify the first sentence by saying that the agent must run on each computer with a data connection, rather than a power connection. But then I realized that I am still fuzzy about how a signal to shutdown would be communicated from the computer with the USB connection to the SmartUPS to the other computers in the rack, or even to the virtual computers running within the host with the physical USB connection. After all, if the UPS is about to shutdown, I would like the separate media PC in the rack to shutdown gracefully. Likewise, all of the virtual servers on my host running SAMBA, LDAP, or other services should also go down gently.

As I plugged the USB connector into my CentOS host, I got the following in /var/log/messages:

Apr 6 15:31:22 bellerophon kernel: hiddev97: USB HID v1.10 Device [American Power Conversion Smart-UPS 1500 RM FW:617.3.D USB FW:8] on usb-0000:00:1d.0-1

I checked /dev for devices created in the last day and saw:

[user@server ~]$ ls -l /dev | grep "Apr 6"
crw------- 1 root root 180, 97 Apr 6 18:39 hiddev1
crw-rw-rw- 1 root tty 5, 2 Apr 6 18:42 ptmx
crw-rw-rw- 1 root root 5, 0 Apr 6 18:40 tty

This confused me somewhat about whether the UPS should be addressed as /dev/hiddev97 or /dev/hiddev1.  I tried each in turn and although PBE could discover the agent, when I tried adding it to my device list, I got a message, “Failed to  apply the configuration profile to [IP address]”.  That sent me scurrying off to google for documentation on hid, hotplug and RedHat USB devices in general. Before I could satisfy my theoretical curiosity, though, I found something that sent me on another tangent.

One of my google searches turned up a page claiming that USB cables didn’t work with APC devices under Linux.  Another response  suggested using NUT.

So here’s my plan for the day:

  1. Try the second cable that came with the SmartUPS (a DB9 with the code 940-1524D on it) to see if I can get the PBE software running.
  2. Evaluate whether PBE can send other computers a shutdown signal the way I would like.
  3. See whether the Linux PBE agent leaves messages in a system log so I can parse it for LogWatch or Big Brother.
  4. At signs of trouble, switch over to NUT and see how they do with 2 or 3–being OpenSource, I expect them to be easier to modify.

Installing Intel’s RAID Web Console 2 4 April 2007

Posted by Mark in VMware Server.
33 comments

Finally got an install of Intel’s RAID Web Console 2 I’m happy with.

My server configuration is an Intel S5000PSL mother board with six SATA drives. Five are in a single RAID 5 logical array, with the sixth configured as a hot spare. All of this was configured through the BIOS RAID utility.

Operating System on the server is the CentOS 4 x86 64 distribution of Linux with megasr driver (version 06.28.11.0.2006). I wanted to avoid installing a graphic user interface on the server. The server will be a VMware host for testing virtual machines, so I would like to keep the packages installed at this level to a minimum.

This meant I needed a way to monitor and configure the RAID system without halting the OS and jumping down to the BIOS. My plan was to use Intel’s RAID Web Console 2 application client on a Windows PC talking to the server software running on CentOS.

Both the Intel Deployment Assistant CD version 1.0.1 (June 2006) that was delivered with the motherboard, and the version 1.2 (November 2006) that I downloaded from the Intel site contained RAID Web Console 1.13. Installing and running this version in client/server mode only gave a Java popup window after 15 seconds or so with the message “No Servers Found”.

I found a more recent version of RAID Web Console 2 for linux on the Intel site (1.19), but struggled to find a similar version for Windows through the Intel site search. Finally, a google search (“RWC2 1.19 site:intel.com”) turned up a ReadMe file for the Windows version that let me guess the URL for the Windows download.

The linux server side installed two services in /etc/init.dmrmonitor and vivaldiframeworkd. I’m not convinced that vivaldiframeworkd started up on its own. It may also be that it died shortly after the installation, but I had to do a sudo /sbin/service vivaldiframeworkd restart before getting a successful test. I should do a reboot on the server to make sure that both of these scripts start up correctly. I also disabled the firewall on the server (using system-config-security) and did lots of netstat -ltpn and scrolling through /tmp/vivaldi_startup.txt before I saw that the service was running on port 49258. With all of the installs and uninstalls of different versions of RWC2 on both client and server, I’m not certain what may have interfered with the server coming up cleanly.

Once both client and server were at version 1.19, the program came up successfully under Windows (though the java app doesn’t seem to show up on the taskbar until you have alt-tabbed it into focus). I did a quick backup of the server, then set disk 4 of my array off-line.

RAID Web Console 2

Success! Though the virtual disk showed up as “degraded” in the RWC2 application, my ssh session from Windows to server was unaffected. My hot spare disk came into play and the array started rebuilding. The lights on the server flashed frantically for the next hour and fifteen minutes, before the Virtual Disk state returned to “optimal”.

I marked disk 4 as a hot spare, then held my breath and physically removed disk 5 from the the disk bay. A rebuild started on disk 4 (again indicated by the RWC2 application and flashing lights). I marched the still-beating disk 5 around the house, showing it to a suitably unimpressed family (though Luke made an effort so I wouldn’t feel bad). RWC2 also removed the disk icon from the port 5 icon. I returned disk 5 to its bay, it showed as “unconfigured & good”, I right-clicked on it and again set it to “hot spare”.

Still to do: See if I can integrate the RAID monitoring service into LogWatch, Big Brother, or some other alert system.