Setting up an Archive Site


written by Joey Mukherjee - joey@swri.org
Last Updated: 2/2/01

Table of contents

  1. Overview
  2. Getting Started
  3. Using the WWW Instead
  4. Setting up a Server
  5. The Hello Program
  6. The FileGet Program
  7. The EntriesGet Program
  8. Examples
    1. The Hello Program
    2. The FileGet Program
    3. The EntriesGet Program

Overview

Being a server site entails devoting some hardware to store the data and a person who will serve as a site administrator. It helps if the site administrator is also a programmer or can administer some software tasks to a programmer. Once a server site is setup, it is usually hands free until there is more data to archive.

Furthermore, we only supply the interface which you are expected to follow. If you work within this interface, all SDDAS software will work with it. That's the good news, the hard part is getting everything working. If you need help, by all means, contact me with your questions.

Getting Started...

First, determine how you want your archive site to work and what resources you can devote to it. For instance, we have a 500 CD Jukebox which will store data. We also have a 27gig disk farm which will also store data. You can devote as much or as little as necessary to service your users. By users, we do not just mean users at your site. If you setup an archive site, you can service the world.

If you are not familiar with the term promote, it simply means getting a piece of data (be it satellite data or a program) from the server site to the client site. We will use this term a great deal since applications can be promoted, data can be promoted, and meta data can be promoted.

The next step is to create the programs necessary to get the data from your site to the users (or client) site. There are three interfaces which you need to setup which allow SDDAS to work. They are as follows:

  1. Hello - describes the server.
  2. FileGet - This will produce a file given a hierarchy and a time.
  3. EntriesGet - This will produce the meta data given a hierachy and a time.

I recommend you do them in that order since that will be the easiest way to test them.

Using the WWW Instead

The old way of doing a server was to run our daemon, the sd_rshd program. However, that program is (and really always has been) a security risk. To counter that, we encourage the use of a web server which has been tried and tested in more settings. You can use a web server such as Apache in place of our daemon, and if clients have later versions of our software, it will work just fine.

To see how to set up your server with a web daemon, read our Web Server document. If you wish to remain backward compatible, this document is still valid. Furthermore, there are some tips in here which may make setting the web interface much easier.

Setting up a Server

The server daemon is just a program which shells out to other programs to do something and then returns the results back to the client. If you are familiar with the web servers, it is just the "cgi-bin" portion of it.

Any command that the server runs must return its output into standard output. All error messages must be returned in standard error. In PERL, this is easy. The "print" command will send something to standard output and the "print STDERR" command will send something to standard error.

The other requirement is that the scripts return an error code of 0 if everything goes fine or an error code of -1 if not.

On the client end, the standard output will be written to a file which is given on the command line. Any infomation sent to standard error will be shown to the client on the screen. This allows a great deal of flexibility of status messages from the server. Keep in mind that some languages (i.e. C and PERL) do not send output immediately and will instead buffer all messages until a new line character is sent. If you are expecting a status message and do not see it, make sure you send a new line character (usually \n).

The server daemon is installed with SDDAS and is called sd_rshd. Even though the daemon is installed with SDDAS, it is not running until told to do so.

Running sd_rshd is fairly straightforward as you just run it with the name of the directory where it will run commands from on the command line. For example, if you have the server commands in a directory /sddas/ServerCmds, the command to run the server is "sd_rshd /sddas/ServerCmds".

All server commands must be kept in a seperate directory. We have always kept ours in $SDDAS_HOME/ServerCmds. This is not a requirement, but feel free to be consistent with ours. It might make things easier if you come to us for help. Furthermore, the rest of this document assumes you have the server commands in a ServerCmds directory.

We have included two scripts which should be modified to start and stop the server. They are also in the $SDDAS_HOME/bin directory and are called StartServer and StopServer respectively.

After you get the server running, it is important that the server be restarted when the machine is restarted, but the server must not be run as root. For Solaris, you can put the script as below into a file in the /etc/rc3.d directory. Call the script "S89sddas" and its contents will be the line:

    su -  -c "/bin/sd_rshd ServerCmds"

Replace "user name" with the name of the user who ought to start the server. This user must have permission to read all the master database entries as well as run programs on the server which facilitate getting the data.

Replace "SDDAS location" with the location of $SDDAS_HOME. Do not use environment variables as they may not be set when this script runs.

There is another program similar in name to sd_rshd called sd_rsh. This is used by the client to communicate with the server. You will not need to do anything with it; however, if you ever want to manually test the server, this is the program to do it.

Each command in the ServerCmds directory must send the required information to standard output. This information is then piped back to the client with the server software.

The Hello Program

The Hello program is currently not in major use, but it will be soon, so please implement it. It is actually the simplest one to implement. All the Hello program will do is describe the server to the outside world and possibly list a contact person and maybe some machine information. Send all this information to standard output.

The FileGet Program

FileGet's main responsibility is to get a file from your archive site and send that file (even if its binary!) to standard output. Typically, standard output is the screen and sending a binary file to the screen is a disaster, but in the server sense, its what you want to do since the sd_rsh program takes standard output from the server and writes it to a file.

FileGet will receive commands in this format :

<type> <P> <S> <E> <I> <V> <Byr> <Bday> <Bmsec> <PreFlg> <PostFlg> <Label>

type is either P, V, H, or D which stand for PIDF, VIDF, Header and Data file respectively.

P - Project name
S - Satellite/Mission name
E - Experiment name
I - Instrument name
V - Virtual name

The next three fields are the begin time of the range requested. This must then be converted to a filename.

Byr - year
Bday - day
Bmsec - millisecond

PreFlg - not really supported yet, but assume this to be zero. This is for any preprocessing that needs to be done to the data by the server.

PostFlg - what the client will do to the data. If the PostFlg is 2, that means the file it is expecting should be gzipped. If the PostFlg is 4, the file requested is a gzipped VIDF file.

Label - this field ought to be ignored! Earlier, it stood for where the server should get its data. However, it is up to the server now to keep up to date where its data is and go get it.

With this information (and only this information from the client), FileGet needs to know enough of your own system of where to get the data from. This can be as simple as going to a directory on the hard disk and then "cat"ting the file to standard output. For some of our servers, the file is kept on mag-optical disks and the server determines which mag-optical the file is on and retrieves it from there.

The EntriesGet Program

EntriesGet will receive commands in this format :

<P> <S> <E> <I> <V> <Start Time> <Stop Time>

P - Project name
S - Satellite/Mission name
E - Experiment name
I - Instrument name
V - Virtual name

Start Time - the start time of the range requested
Stop Time - the end time of the range requested

EntriesGet must look at your master databases and return the meta data available in that range. Some example source code is available for this.

Example: The Hello Program

Here's an example for the Hello script written in C-Shell Script language which we use on cluster.space.swri.edu which serves no data, but serves all applications.

#!/bin/csh -f

set machine_info = `uname -a`
echo ""
echo "This is the main development site for all of SDDAS."
echo "This machine serves no data, only binaries."
echo ""
echo "The machine specifics: $machine_info"
echo ""

Example: The FileGet Program

This example is probably not the best example since it is written in C-Shell; however, it shows the basics of determining where a file is and then "cat"'s it to standard output.

#!/bin/csh
#

set archive_dir = /p1/archive_data
set MakeFileName = MakeFileName.csh
set PostProc = PostProc.csh
set stderr = stderr

## echo "FileGet $#argv args >$argv<"

if ($#argv != 12) then
    echo "Usage: $0 <type> <P> <S> <E> <I> <V> <Byr> <Bday> <Bmsec> <PreFlg> <PostFlg> <Label>"
   exit -1
endif

set Type  = $argv[1];
set Proj  = $argv[2];
set Sat   = $argv[3];
set Exp   = $argv[4];
set Inst  = $argv[5];
set Vinst = $argv[6];
set Byr   = $argv[7];
set Bday  = $argv[8];
set Bmsec = $argv[9];
set Pre   = $argv[10];
set Post  = $argv[11];
set Label = $argv[12];

set File = `$MakeFileName $Vinst $Byr $Bday $Bmsec`

switch ($Type)
case "H":
   set F = ${File}H;
	breaksw
case "D":
   set F = ${File}D;
	breaksw
case "V":
   set F = ${File}I;
	breaksw
case "P":
   set F = ${Vinst}.pidf.v2;
   set Post = 2;
	breaksw
case "S":
   set F = ${Vinst}.scf;
	breaksw
endsw

set ext = `$PostProc $Post`

stderr "$F$ext"

set file = $archive_dir/$Proj/$Sat/$Exp/$Inst/$F$ext

if (-f $file) then
   cat $file
   exit 0
else
   exit -1
endif

Example: The EntriesGet Program

This example EntriesGet you might be able to use verbatim. It makes use of the program "entriesget" which is available from us. It happens to take the same parameters as the EntriesGet script.

This is a PERL script and one thing it does is check to see if a range has already been promoted and if it has, it will sleep for five seconds and then exit with a positive error code so the client will think everything is fine.


#!/opt/local/bin/perl -w
#

$SDDAS_HOME = $ENV{"SDDAS_HOME"};
$SDDAS_DATA = $ENV{"SDDAS_DATA"};
$history_file = "history.meta";
$sleeptime = 5;

##print "FileGet $#ARGV args >@ARGV<\n";

if ($#ARGV != 6) {
    print STDERR "Usage: $0 <P> <S> <E> <I> <V> <Start Time> <Stop Time>";
    exit -1;
}

$str = join (" ", @ARGV);
if (&CheckTime ($str) == 0) {
    system ("entriesget @ARGV");
    $retcode = $? >> 8;
    if ($retcode != 0) {
        print STDERR "The files might not exist on this server!";
    }
}

exit (0);

sub CheckTime {

    local ($search_str) = $_ [0];

    $currtime = time ();
    $add_str = "$currtime - $search_str\n";

    @histlist = ();
    if (open (HISTORY, $history_file)) {;
        @histlist = <HISTORY>;
        close (HISTORY);
    }

    @foundlist = grep (/$search_str/, @histlist);
    if (@foundlist > 0) {

        ($oldtime, $search_str) = split (/ - /, $foundlist [0]);
        if (($currtime - 60) < $oldtime) {
            sleep ($sleeptime);
            return 1;
        }
    }

    unshift (@histlist, $add_str);
    splice (@histlist, 11);
    open (HISTORY, "> $history_file");
    foreach $item (@histlist) {
        print HISTORY "$item";
    }
    close (HISTORY);
    return 0;
}