WE EXPLORE Missouri State Census Data Center

The Hypercon Application

Overview

Hypercon is a cgi-bin application written in SAS(r) that will display the "contents" (i.e. the metadata, or information about the data) of either a single SAS data file, or an entire SAS data library (the collection of all SAS data files within a directory.) It is actually a set of two closely-relted modules which generate two quite distinct kinds of reports:

  • directory reports -- reports that summarize SAS data files (datasets) within a directory. Information in these reports is mostly attributes of entire files (datasets).
  • dataset listing reports -- reports presenting information for a single SAS dataset. Information in these reports is mostly attributes of varialbles within a specific dataset.

While hypercon can be invoked directly with its own home page, http://www.oseda.missouri.edu/uic/uicapps/hypercon.html

requiring the user to type a complete path and SAS dataset name, it is most commonly and easily invoked using the uexplore navigation program and the sasapps.pl intermediate application. When this method is used the path and SAS dataset name are passed along and are already filled in on hypercon's main menu page.

Sample Run

The easiest way to understand hypercon is just to invoke it and read the screens carefully. So let's do an example. Invoke the uexplore application and follow the links to the stf903x directory (filetype). You can do this directly using the URL http://www.oseda.missouri.edu/cgi-bin/uexplore?/mscdc/data/stf903x@secure .

Take a minute to look at all the nice file descriptions (you will not find such complete and useful descriptions in many of the filetype directories.) Scroll down near the bottom and select (click on) the file us.ssd01. This should bring up the sasapps intermediate screen; click on the pull-down menu and select Hypercon-SAS dataset contents with Directory Option. Then click on the Submit Request button.

You should now be presented with the Hypercon main menu page. You should note that the form-entry field for the Directory (path value) has already been filled in (the uexplore program told the Hypercon program where you were). As you scroll down the page you'll see a lot of buttons and boxes that you can use to control the details of the report(s) you are about to request. Two of the key items among these options will already be filled in based on what you have already told uexplore:

  • Under Directory Options /Report Options the choice has already been made to have a Directory report and a detailed dataset listing generated.
  • Under Data Set Listing Options/ Data Set option the value "_ALL_" will have been entered for you.

For now, just ignore all the rest of the options on this page. (Most users can ignore them every time since it is rare for any but the pickiest of users to want anything other than the pre-set defaults.) Just go ahead and click on the Submit Request button near the top of the page. You should now experience a wait of several seconds while the program does what it has to do in order to dynamically assemble and format the report you have requested.

The next page you should see is an HTML table containing information about all the SAS datasets in the current directory. You should recognize (in the column headed "Description") the text that appeared in the menu page you saw earlier -- the one from which you chose the us.ssd01 file as the one you wanted to access. Since you chose the Hypercon with directory option what you have is information about all the SAS datasets, not about the variables of the particular one you chose (us.ssd01). The report gives some basic information about each dataset: when it was created, how many observations and variables (like rows and columns, records and fields) it contains, its total size in bytes (in case you are considering downloading the whole thing) and, of course, the descriptrive text associated with each dataset. (Total observations information will not be available for SAS views.)

Hyperlinks from Directory Table

The important thing to notice about this page is that the SAS set names are highlighted as hyperlinks. What the Hypercon-directory program has done is not only created this report for you (which, of course, you can readily send to your printer or capture as a document image on your local system), but it has also made it very easy for you now to "drill down" to get more detailed information about any of the datasets summarized in this report. Scroll down to the row with the "US" entry and point to this name without clicking. Look at the underlying URL which should be displayed at the bottom of your browser window. You'll see that its one of those long hairy cgi-bin URL's with several parameters and other things you'd rather not have to be botherered about (in fact, the URL is so long you probably won't be able to see all of it in the window.) If you are curious (and you really don't have to be) you could use your browser's View-Document Source buttons to see the complete underlying HTML.

Notice that the "US" entry here is really the same thing that appeared as "us.ssd01" on the earlier selection menu. The SAS program has up-cased the name for its own use, and has dropped the ".ssd01" extension, which it assumes (otherwise SAS would not recognize it as a SAS dataset.)

Now go ahead and click on the "US" set name entry/link. Once again, you may experience a several-second wait as the SAS program requested by the URL executes and builds the page you are now about to receive "on the fly".

The "Hyper-Contents" Report

If all goes well the next page that should be displayed in your browser window is the Hyper-contents report. This report contains detailed information about the specific SAS dataset requested. There is a lot of useful data here, although how useful some of it may be will depend on how you plan to use it. (Persons familiar with the SAS software package will recognize this output as similar to what they get when they run the SAS utility module proc contents; this is the origin of the "con" in "hypercon".)

Variable attributes and format value-label links

The top portion of this page repeats information previously displayed in the Directory report; items such as the set name, label, description, number of observations and number of variables. This is followed by an HTML table in which the rows correspond to the variables in the dataset, and the columns contain attributes of those variables. Some of these attributes are somewhat technical and will be of interest primarily to programmers, but others provide important descriptive text. The LABEL column displays the SAS variable label, if any. The LEN field indicates the type (SAS varibles are stored as either character strings or numeric items in floating point format) and size of the variable in bytes. Character variables are identifed by having a "$" precede the length value. Thus a LEN-column value of "$4" means you have a character string item 4 characters long; a value of just "4" (no $) means it is a 4-byte floating point item (which means it can hold up to 7 significant digits.) The FORMAT field is also somewhat technical and SAS-specific. A format in SAS controls how a variable is to be displayed; a value of "5.1" for a numeric variable means it will be displayed as 5 characters with one digit after the decimal point (rounded), e.g. "123.4", while a value of "comma7." would display something that looked like "123,456". Not all variables have (or need to have) a format associated with them. SAS also allows users to create their own custom formats that serve as "value labels" generators for variables. These are generally used to translate non-mnemonic code values to meaningful names. For example, if a variable contains a 4-character FIPS (Federal Information Processing Standard) metropolitan area (MSA or CMSA) codes, it is useful to define a SAS format that will translate a value such as "7040" to a meaningful label such as "St. Louis, MO-IL". The MCDC data archive makes extensive use of such format codes, and maintains a library of them. These format codes are often permanently associated with variables within the archive. For example, if you are browsing the Hypercon report for the stf903x.us dataset (standard terminology in referring to SAS datasets within SAS and within the archive is to use this 2-part naming convention consisting of the filetype, a dot, and then the SAS dataset name, without the extension) then you should see that in the 11th row the table has an entry of $METRO. in the FORMAT column. Format codes are supposed to have periods associated with them (it's a SAS syntax thing in case you care), and yes, the format name appears to also be a hyper-link. So go ahead and click on it.

The application retrieves the format module from the library of such modules we mentioned (it's the directory /mscdc/sasfmats which you can uexplore separately if you have any interest in doing so) and simply displays it as a plain text file in your browser. This can sometimes be useful in helping you "look up" a code that you need to use when trying to filter on (i.e. select based on the value of) one of these codes when doing an extract. In this case, the $METRO source module is displayed for you. To find out what the code is for Omaha you could use your browser's find command and go directly to that entry. The line containing '5920'='Omaha, NE-IA' means just what it looks like -- the correct code for Omaha is 5920. Use the back key to return to the Hypercon report.

Descriptions and Labels

One of the rather dumb things you should notice about this report is that it has two columns, one called "LABEL" and one called "DESCRIPTION", that contain the identical text (at least as of 9/97 they did; this could change by the time you run this.) It's not quite as dumb as it looks; it just means we haven't completed work on something. Many Hypercon dataset reports will not have a DESCRIPTION column. It only appears if the program finds a special Metadata.html file associated with the dataset. When it does find one, it then searches for descriptive text in that file and displays it in this column. Sometimes we create very good (long, anyway) Metadata files and the descriptive text can get rather lengthy. LABEL values, on the other hand, are built into the SAS file architecture and are (currently, in Version 6 -- scheduled to change in Version 7) limited to 40 characters. So, why do they (LABEL & DESCRIPTION) sometimes contain exactly the same information? Because we have a utility program that we use to generate a "skeleton" version of our Metadata files. That program has an option that says to use the variable label as the default initial value for the descriptive text entry. What we plan to do when we create a Metadata file is to edit it extensively to add more useful annotations and descriptions of the variables in the dataset(s). But sometimes there is a time gap between when we generate the initial ("skeleton") version and when we get around to creating a more useful complete Metadata file. A good thing to do when you recognize this is happening is to use the options back on the Hypercon menu form to turn off either the Label or the Description fields. (Yes, I know, it's usually too late for that - but in case you were going to do another report for other datasets in the same directory.)

The Obs Columns

Unless you are color blind or not really paying attention, you should notice that the rightmost three columns of the Hypercon dataset contents table are a different color (most browsers should display them with a light blue background in contrast to the yellow background of the rest of the table). These columns have titles (across the top) of "Obs 1", "Obs 2" and "Obs 3". What these columns contain is actually not metadata, but what we sometimes refer to (for want of a more technical term) as "data data". But it's regular data data being used as a kind of metadata. The reason we are showing you the actual values for the variables is to simply give an example of what the data stored in this variable looks like. (The exprsssion around our shop is that "One data data is worth a dozen SAS attribute codes.") As we'll discuss below, there are options you can specify to suppress these columns or to alter which observations are displayed. (But you really should not try to use Hypercon to display data values - xtract is the appropriate tool for doing that.) Note that values displayed in these columns are formatted, i..e. when a format code is associated with the variable then the value that is displayed here is the result of converting the value using that format code. You may see "St. Louis, Mo-Il MSA" displayed when the actual value in the dataset is "7040". This can be important when you do a data extraction.

Direct Link to Extract

At the bottom of the Hyper-Contents report page there are a fairly standard set of links to take you to either the main menu page of the Hypercon application (you probably do not want to do this unless you know where you want to go next and can get there without the help of uexplore) and the UIC home page. But a much more useful link is the one that will take you straight to the xtract application. This will be useful when you are surfing the archive for interesting data and now you have just found something; or maybe you are not sure about just what sort of values are in some variable and you want to go get a quick listing of the first 50 or 100 observations. Best way to do this is to jump from metadata to data data, from hypercon to xtract. This shortcut link saves you from having to do a series of back clicks to return to where you are pointing to the right data set and can select the alternate SAS application.

The Hypercon Menu Pages -- Customizing the Reports

The large majority of users, in the large majority of cases, will probably have little need for many/most of the array of options available to control the exact format/content of the reports produced by the application. But in case you feel the need to cusotmize, this section explains what the various options do. You can read this, or you could just pick a small dataset and experiment a little. We assume here that you have invoked the Hypercon application with the Directory option specified.

Directory-Related Options

Figure 1 shows the portion of the Hypercon menu page dealing with options you can specify related to generating the directory report. The first and most important item is, of course, the name of the directory. If you are using uexplore to invoke Hypercon then this is easy - the value is entered for you.

Fig. 1 Directory Related Options for Hypercon

Along the left side of the form there are two sets of radio buttons. The first set, labeled Report Options, allows you to choose whether or not you even want a Directory Report. The first (default) selection says that you want a Directory report and you shall also be interested in drilling down to get Detailed Dataset Listings as well. Your alternate options are to say you want only a Detailed Dataset Listing for a single data set - no Directory report. A third and seldom-used option is to produce the Directory Report only - the set names will not be hyperlinks allowing you to go directly to get detailed dataset reports. Will cause your report to run just a slight bit faster. (This option is left over from an earlier version of the application where we actually generated the detailed set reports ahead of time instead of waiting for you to select a dataset.)

The Sort Options set allows you to specify that the datasets in the Directory Report will be presented in reverse order of their creation date. Useful if you are wanting to see which datasets may be new.

The right side of this page is just a series of checkbox options corresponding to the various items of information that are normally included about each dataset in the report. If you do not need or want to see any of these items you can un-check any of the boxes and the corresponding item will not be included in the table/report.

Data Set Related Options

Figure 2 shows the portion of the Hypercon menu page containing options related to the individual dataset reports that will be generated. Once again, the first and most critical item that needs to be entered is the name of the data set(s). If you got to the page by specifying that you wanted a Directory listing then this box will have the special keyword _ALL_ already entered for you. If you arrived here via uexplore and specified that you only wanted to see a report for the selected dataset without a directory report, then that SAS data set name will be already entered here. Names are not case sensitive and should not include the .ssd01 extension.


Technical stuff (if this were a "Uexplore for Dummies" book it would be contained in a gray box with a Nerd icon next to it indicating that most of you will want to skip this part):

Hypercon is actually two programs that know about each other. One does the directory-level stuff and the other does the dataset level stuff. When the directory program sees that you want _ALL_ the datasets in the directory it goes ahead and generates the underlyng URL's so that you can just click on a set name and it will generate the hypercon report for that dataset. As part of those URL's (written by the first program and invoking the second one) any optional specifications that you have entered here on these main menu page get passed to the second program. ICYC ("In case you care").


Fig 2 Data Set Related Options for Hypercon

The Variable Sort Options let you change the order in which the variables appear in the report. The default is by relative position within the data set. This is usually preferred because we generally try to arrange the variables in a logical order -- key identifiers at the beginning and related data items grouped together. Alphabetical order may be of use for certain files where you just want to be able to check on a specific variable where you already know the name.

The Variable Descriptor Options are a series of check boxes that can be used to specify which items (columns) are to appear in the report. Number refers to the variable number (relative position within the dataset); Label refers to the variable label as stored on the data set (limited to 40 characters); Type and Length refers to the SAS variable type ("$" used in reports to indicate character type variables followed by the length in bytes) - this item appears with its column labeled LEN; Format refers to the SAS format code used to display the variable; Informat refers to something called an Input Format in SAS, which we never use in the archive, so you should probably never choose; and Description refers to the descriptive text for the variable that appears in a Metadata file. Many/most datasets do not have a Metadata file and many that do have descriptive text that is just a copy of the Label text. If you choose this option and the program does not find a Metadata file then it will ignore it.

Under Additional Variable Options we allow you to have control over which variables appear in the report. It is seldom necessary for you to worry about this, unless you are looking for information about specific variables from a very large dataset (i.e. one with lots of variables.) Hypercon has a built-in limit of 500 variables that it will display in one Data Set Listing report (without this limit the program requires too much memory and/or takes too long to produce output.) If a dataset has more than 500 variables the "Start with" and "End with" boxes let you enter the name or number of the variables you wish to see in the report. Default is to see the first 500 variables on the data set. While Hypercon will produce reports for subsets of the variables for such very large datasets, we do not recommend it. If you want to see what the key variables are for something like an STF903 dataset (1990 Summary Tape File 3 - where each observation has over 3300 variables) you would do well to just list the first 50 or so variables, since this will include all the geographic and other observation keys. You'll need to see a sample of the variables corresponding to the tables, but once you see these and realize that the variable naming convention is PiIj for population-related tables (where i is the table number, and j the cell number within the table), then you may want to look for other sources of documentation (in the Tools or Docs subdirectories) to help get a handle on these huge and complex files. You might even want to see if you can locate the excellent printed documentation for these files produced by the Bureau.

The Display Observation Value Options let you control whether you want your report to contain any sample observation columns. These can be useful in seeing how the data is formatted. By default you will get the first 3 observations (assuming the dataset has that many) listed. You can select the NO button to turn off these columns altogether, or you can modify the starting and/or ending observation numbers to be displayed. A maximum of 5 obervation columns can be selected. So saying you want observations 1 to 100 won't work. If you want to see the first 100 observations, try the data extraction option.

The README File Options let you specify how you want to handle any special Readme(.html) files that may be present for a particular dataset. The default is to provide a hyperlink to it so you can jump to it from the screen but it is not part of the generated document. If you plan to print the report and you want the Readme file contents to be a part of it, then you'll want to select the Include any Readme Files option. You also have the option to ignore Readme files altogether using the Don't Include any.. button.


Main Page of We Explore
|| Overview || Invoking uexplore || Basics of the /mscdc/data space
|| Using uexplore and sasapps || The hypercon application || The xtract application