Accessing the Missouri Census Data Center SAS Share Server

Overview

This document explains how to set up and use the mcdcshr SAS(c) Share Server running on the Missouri Census Data Center's AIX platform (mcdc2.missouri.edu) located at OSEDA (Office of Social and Economic Data Analysis, U. of Missouri). It is intended for people wanting to access the MCDC data collection using SAS, typically under Windows but also under Unix. Using the server requires only that you have access to SAS and have a live Internet connection. You do not need to have a userid on the mcdc2 system, but the server (service) does have its own password that you will need to know. Our purpose here is to provide you with just the information you will need to know to access this particular service; there will be no attempt to provide general information about the hows and whys of SAS Share servers in general.

Adding Entry to Your System Services File

This turns out to be the hardest part for many people, but it really need not be. It is a one-time setup item that simply involves editing a special file on your system, adding a single line that identifies the mcdcshr service. You need to know two things:
  1. What file to edit (we know it's called the services file, but what is its name and where can I find it on my system?)
  2. Edit it how? I.e. what do I have to add to this file so that it knows about this new service?
For Windows users, the answer to Question 1 is somewhat dependent on the version of Windows. And even then, the exact name may change over time as Microsoft sees fit. On my Windows XP machine circa 2003 the services file is
C:\WINDOWS\system32\drivers\etc\services .
Two years ago I was using Windows NT and the services file was
c:\winnt\system32\drivers\etc\services .
There appears to be a pattern. Depending on how you use SAS (i.e. you may be in a shared resource setting where you are not authorized to modify certain critical system settings on your machine) you may have to have someone who is authorized take care of this change for you. Regardless of who does it, here is what has to be done:

Find your services file and open it in a text editor, such as Notepad or Ultraedit (or you could even use the SAS program editor.) You do NOT want to use a word processing application like MS Word or WordPerfect. The services file is a simple text file and needs to remain so. You need to append the following line at the end of your file:

mcdcshr          8012/tcp                           #--SAS/Share server on mcdc2 

The name mcdcshr is the handle that allows applications to associate this entry with requests to utilize the service. 8012 is a port number and tcp is a communications access method, two items that are required to complete access to the service. You don't really have to know how or why this works. The # indicates a comment and the characters that follow are just to help you remember what this entry is for.

You need to save the file with the new line added. Then you will need to reboot your machine to cause it to be read and processed by your operating system. You only have to this once, unless we decide to change the port or the name of the server.

If you want to access the server from a Unix system you need to add the same entry to the services file on the Unix box. This is much more likely to be something that you will need help with from your Unix system administrator. Users typically would not be authorized to modify the services file on a Unix system.

Code to Add to Your SAS Programs

Before SAS can access the SAS/Share server it has to know where it is on the Internet. You tell it this by defining a macro variable with a value equal to the server's IP address. For this server it means you should code
%let mcdc=mcdc2.missouri.edu; Now to point to this server you will use the name mcdc.mcdcshr . The part before the period references the machine (SAS knows to look for a macro variable and resolve it to get the IP address); the part after the period identifies the specific server running on that machine (platform). It is also used to locate the entry in your services file in order to get the port and tcp access method information.

To actually access data on the server you have to code a special form of the SAS libname statement. This form looks like the following example:

libname pums2000 server=mcdc.mcdcshr sapw=<password-goes-here>;
As with any libname statement the word following libname is called the libref. What this statement usually does is associate a path (directory) with this libref value. The usual syntax requires that the path be specified in quotes following the libref, but there is no such path specified here. The reason for this is that the way we have configured this specific mcdcshr service we do not allow nor require the path to be specified (some SAS Share servers do allow/require it). This server has been configured to work with a fairly long list of pre-defined libname/libref definitions. These librefs generally correspond to the subdirectories of the /pub/data public data directory. You can only use the service to access these pre-defined data libraries and you can only do this by using the predefined name.

For example, to access the 1990 Public Use MicroSample data stored in the pums90 subdirectory of /pub/data you have to use
libname pums90 server=mcdc.mcdcshr;
Why "pums90"? What not "pums1990", or simple "pums"? Because the libraries and librefs have already been pre-defined. "pums1990" was not defined. It turns out "pums" was, as an alias (or alternate) name for pums2000. So coding
libname pums server=mcdc.mcdcshr; is similar to the same statement but using the pums2000 libref.

Note that we included a password parm in the first sample libname statement shown above, but note on the ones after that. It turns out that you only have to specify this password once per batch program or interactive SAS session. After that SAS remembers that you have passed that test and does not require you to specify the password again. So all you have to do is include a libname statement in your autoexec.sas file that has one of these libname statements specifying the sapw value. This not only takes care of giving you access to that directory but also means you never have to code the password on any other libname statement. Similary, you can code the %let statement that defines the value of mcdc as part of your autoexec, and then you won't have to worry about that either.

How Do I Know What Data Is Available?

That is a whole other subject. This document deals with the mechanics of accessing the datasets but does not deal with all the issues of how would know what data is out there and how do I use it. To get more information about the data collection we recommend that you use the uexplore web application (which runs on the mcdc2.missouri.edu box, even though that initial entry page is on a different server.) This page contains paragraph-length descriptions of many of the most pertinent "filetypes" (/pub/data subdirectories) and these names are the ones used for the hyperlinks on this page.

You can view the sas_shr_libs.sas file that is used to define all the available librefs/filetypes when using the mcdcshr server. We try to keep this list up to date, so that when new filetypes are added, a libname statement is added to that page. But sometimes there can be a short lag between when the new type is created and it gets defined on the server. Feel free to remind us to add the new entry if you find one that appears to be omitted.

But How Do I Actually Access the Data?

It's really quite easy once you've code the libname statement. And if you know what dataset you're interested in. Say, for example, that you have coded and executed this statement:
libname pums server=mcdc.mcdcshr;
(Remember, pums is defined as an alias, or alternate name, for pums2000). What if you wanted to see a list of the datasets within this directory? Just use the SAS Display Manager "dir" window. Type:
dir pums
and SAS will open a new DIR window and display the datasets. If you right-click on a file in the DIR window (sometimes you have to hold it down) you can then select the "View Columns" option. This will then open a window that gives you a nice Proc-Contents like description of the variables that comprise the dataset. If you prefer the direct method with a bit more typing you can skip the "dir pums" command and go directly to (i.e. enter the command)
var pums.moprecs5
which will result in the variable properties window opening with information about the moprecs5 dataset.

So what you have here is fast, convenient access to these data stored on a remote server (although how fact and how convenient can be a function of the speed of your Internet connection.) If it's a bit slow it can be very easy to create a quick sample of the remote file that you can then examine from your local machine. To get a local copy of the first 100 rows (observations) on the moprecs5 dataset, just sumbit this code:

data moprecs5; set pums.moprecs5(obs=100); run;
Pretty simple. You can also copy the entire dataset over by omitting the "(obs=100)" spec, but then you have to have room on your local machine to store it. Since many of the datasets on the server can be quite large you need to be careful when doing this. A good strategy is to determine a subset of of rows and columns that you think you might want to use and just create a local copy of that extract. Something like:
     libname project 'c:\myprojects\pums2k';  
     data project.sample;  
       set pums.mohrecs5(keep=puma hweight tenure .... /* choose variables here */ );
       *---select rows/observations to keep using a where filter---;
       where puma in (...list of selected puma codes...) and tenure='1'; 
       drop tenure; *--we had to keep it for use in the where filter but now we can drop it--;
       run;