Q. Will the radar data degrade the network performance of my LAN?
A. No. The Auto–Nowcast Environment exists on two separate local area networks. The radar data is broadcast on a dedicated LAN which does not interfere with the LAN on which your other machines reside.
Q. What should I do when I cannot login to one of the Auto–Nowcast Environemnt host machines?
A. If the host which contains the shared project space is down then it will not be possible to login to any other Nowcast hosts. This is because the .cshrc for all hosts are linked to the .cshrc in <shared project space host>:$PROJ_DIR/.cshrc.If you can't login to any nowcast host, try rebooting the host of the shared project space.
If the problem is isolated to one host, try rebooting that host.
Data Ingest & Management
Q. What should I do when a dataset is late or missing?
- A. Scream.
- If recent data files are not being written to disk in the data directory location, check to see if the data disk is full.
- If it is a NEXRAD radar dataset which is missing, make sure the radar data are being broadcast on the ethernet or via LDM.
- If everything looks ok up to this point, then the process responsible for generating this particular dataset may be having problems. First, determine the responsible process by examining the input/output tables in sections on Analysis Algorithms and Data Ingest. Once you have identified the responsible process, follow the diagnostic procedures for a missing or restarting process.
Q.How can I be sure that radar data is coming into a Auto-Nowcast host?
A.First, check the raw radar data directories directories to see when the last data files arrived. A second check can be run on the hosts which have the dual ethernet cards installed for receiving the radar data. This check requires root login privileges.
To find out whether a host has 2 ethernet connections (expected: eth0 and eth1), run the command:
If 2 ethernet connections are not listed, this host is not directly receiving the radar data feed so this check is not valid. If both eth0 and eth1 are listed, look for which one has "Bcast" set to xxx.xxx.199.xxx or xxx.xxx.197.xxx. This is the port which would be receiving radar data. To check that data is actually arriving on that port, run the command (for eth0, replace eth1 below with eth0):
tcpdump -i eth1 udp port 3280 If radar data is arriving, you will see a data stream similar to the following:
16:35:06.810338 22.214.171.124.3280 > 126.96.36.199.3280: udp 1450
16:35:06.810338 188.8.131.52.3280 > 184.108.40.206.3280: udp 998
16:35:06.820338 220.127.116.11.3280 > 18.104.22.168.3280: udp 1450
16:35:06.820338 22.214.171.124.3280 > 126.96.36.199.3280: udp 998
If radar data are not being broadcast on the ethernet port, you will need to check on communications upstream of the Auto-Nowcast Environment.
If you are receiving radar data via an LDM then you can type ldmadmin watch to see if bundles of radar data that are being received for a particular radar ID. The output of ldmadmin watch looks like this:
Oct 13 20:34:50 pqutil: 11144 20031013203446.553 NEXRD2 125000 L2-BZIP2/KIND/20031013203501/125/0
Oct 13 20:34:52 pqutil: 12277 20031013203448.979 NEXRD2 181020 L2-BZIP2/KLOT/20031013203139/181/20
Oct 13 20:35:01 pqutil: 12226 20031013203457.569 NEXRD2 181021 L2-BZIP2/KLOT/20031013203139/181/21
Oct 13 20:35:09 pqutil: 11593 20031013203506.282 NEXRD2 181022 L2-BZIP2/KLOT/20031013203139/181/22
Oct 13 20:35:10 pqutil: 11322 20031013203506.248 NEXRD2 125001 L2-BZIP2/KIND/20031013203501/125/1
Oct 13 20:35:14 pqutil: 8206 20031013203511.688 NEXRD2 181023 L2-BZIP2/KLOT/20031013203139/181/23
Q. What happens to the Auto-Nowcast Environment when a data feed goes down?
A. Because so many processes in the Auto–Nowcast Environment rely on the radar data feed, very few processes will continue to generate results if the radar feed goes down.
If other data feeds go down, some algorithms which rely on those data may output "missing data" datasets, while others will wait for the input data to become available before producing any output.
In general, the Auto–Nowcast Environment will continue to run, and when data become available again the algorithms will resume outputing meaningful results.
How much data is generated each day?
This will vary depending on the type and number of datasets at your installation.
Q. What should I do when one of the data partitions becomes full?
A. If the "/home" or cross–mounted shared data partition disk is 100% full, you should check that the events list is not saving so many cases that the data scrubber cannot free up sufficient disk space. If this is the case, the older cases should be archived and removed from $DATA_HOME/params/events.list.
If the "/" (a.k.a. "root") partition is 100% full, you must reboot the machine.
Q. What should I do when a process is restarting or missing?
A. Following are some actions to take for diagnosing restarting or missing processes:
- Make sure the application executable exists in $RAP_BIN_DIR or $RAP_SHARED_BIN_DIR
- Make sure that the application instance in the process table matches the instance in the application start script.
- Examine the application log file.
- Run the process startup script manually to see what errors are printed out. To run the process manually first kill the process by running the appropriate kill script, then in an xterm, re–start the process using the appropriate start script.
To figure out the correct kill and start scripts for the process, examine the Host and Script specifications listed in the sections on Analysis Algorithms and The Project Directory.
NOTE: Be sure to kill and re–start the process on the correct host. The logical host is specified either in the start script or in the corresponding parameter file. The physical host is then determined using the UNIX echo command. For example: echo $TITAN_HOST
Q. What happens to the Auto–Nowcast Environment when a process in the procmap is restarting continuously or is missing?
A. The process will not be able to generate its expected output. This may also affect other "downstream" processes which need that process's output.
In this situation, it is important to diagnose the source of the problem. For information on trouble shooting a process which is failing, see the discussion above.
Q. What is the difference between Mode 1 and Mode 2 process restarts?
A. Mode 1 process restarts are due to a process missing from the procmap. This occurs if a process exits prematurely.
Mode 2 process restarts are due to a process not registering frequently enough with the procmap (check the Heartbeat time in the procmap window). In this case, the process is believed to be "hung" and is stopped and restarted by the auto–restarter.
Q. What does it mean when I get the message "No more processes" or "Cannot exec" or "Cannot fork" when I try to run a command?
A. This usually indicates that the host has run out of processes, i.e., the process table is full. In order to run the command you need to free up some process space. If you are running extra processes, which are not part of the Auto–Nowcast Environment, try to exit them. If that does not free up sufficient process space, try running the command exec top.You can start terminating processes until you can begin executing commands in the xterm. By examining the output of top you may also be able to determine which processes are filling up the process table and diagnose the problem causing the table to fill up.
Q. The Auto–restart log for the previous few days is empty. Is this a problem?
A. The auto–restarter only outputs to the log file when a process is restarted. An empty log file indicates that no processes were restarted, the system is up and fine.
Q. Can I rearrange the location of datasets or add new machines to the Auto-Nowcast Environment?
A. Yes. In order to relocate processes and datasets to a new machine the following must be edited:
- process lists in $CONTROL_DIR/proc_list
- host data lists in $DATA_HOME/data_lists
- URLs in param files for applications which access the data.
It may be necessary to do a host_mkdata to install new data param files and do host_shutdown and host_startup to install new runtime process lists. Also you must restart all processes which access the data.
Q. Can I get software upgrades?
A. When the Auto–Nowcast Environment is installed at a field site, a snapshot of the application software is stored on disk so that any software problems that arise can be debugged with a version of the software which is running in the field. In the meantime, these same applications undergo changes at NCAR as a part of the ongoing effort to improve the software.
Depending on the magnitude of the changes that have taken place in the software at NCAR, an upgrade for any one application may be a simple matter of taking a new snapshot and copying the executable to a field machine. However, upgrading an entire Auto–Nowcast Environment may require reconfiguring the Environment to incorporate enhancements to the operating system, networking, interprocess communication, and process control.
In order to maximize stability in the Auto–Nowcast Environment, we do not install full software upgrades during an operational season. A full upgrade and reconfiguration typically takes four weeks of NCAR staff time.