SPSS Tips & Tricks

        If you haven't already, I highly recommend signing up for the SPSS listserv (e-mail discussion group) -- it's an outstanding source of help and vicarious learning, both about using SPSS and general statistics advice (click here for sign-up instructions) .

   These pages aim to include tutorial information along with the syntax pointers, and are updated sporadically.  For THE site on everything SPSS, including syntax help, macros & the rest, check out Raynald Levesque's archive ; he also posts frequently to the list.

 Hint -- to download files (like the data sets), hold down the SHIFT key before clicking on the file name.  Your browser should then ask where you want to save the file.

        If you have ideas to add to these pages, please drop me (Carol Albright) a note:  calbright@visi.com

Contents


What's DATA LIST LIST Do?

    If you subscribe (you DO don't you?) to the SPSS Listserv, you'll notice many examples of solutions start out with a set of commands somewhat like the following:

DATA LIST LIST /subj_id (f3) visnum (f2) visdt (ADATE11) age (f3.1) laba (F2.1) labb (F2.1).
BEGIN DATA.
101 1 11/01/1999 23 34.1 33.0
101 2 01/08/2000 24 23.3 22.1
101 3 06/15/2001 25 34.1 33.0
204 1 01/23/2000 46 13.3 24.1
204 2 11/14/2000 47 24.1 33.0
207 1 05/08/2000 36 28.3 27.1
END DATA.


    Rather than typing data into a dataset using Access, Excel or SPSS, sometimes data arrives as a simple text file with each variable separated into different columns.  The DATA LIST command tells SPSS how to read such files.  For our purposes, it's an easy way for readers to setup a sample dataset that the remainder of the posting or tutorial uses.

    The DATA LIST LIST command tells SPSS that the data are in a simple list, with one record per row and only a space separating variables.  The information following the slash (/) includes the names and formats of the variables.  For example, our first variable, SUBJ_ID is a regular number, a Fixed with a width of 3.  VISNUM is also a number, but with a width of up to 2 digits.  VISTDT is an ADATE11, a date field with width 11.  (Check the online SPSS help for all the different formats available (search on Format).)

   Everything between the BEGIN DATA.  and the END DATA.  commands are our data.  Open a syntax window in SPSS, copy and paste the block of commands.  Highlight all of the lines (ie from DATA LIST thru END DATA) and run the commands (CTRL-R is one shortcut).  You should now have a new, Untitled dataset in your data window.

    Here's what the data look like:

List
 
SUBJ_ID VISNUM       VISDT  AGE LABA LABB

  101      1   11/01/1999  23.0   34   33
  101      2   01/08/2000  24.0   23   22
  101      3   06/15/2001  25.0   34   33
  204      1   01/23/2000  46.0   13   24
  204      2   11/14/2000  47.0   24   33
  207      1   05/08/2000  36.0   28   27


Number of cases read:  6    Number of cases listed:  6


    Now you're set to do the tutorials in the Cases to Variables and Variables to Cases sections.


Data Restructuring - File

  1. Changing a Many-Records-Per-Subject data file to a One-Record-Per-Subject (Repeated Measures)
  2. Changing a One-Record-Per-Subject data file to a Many-Record-Per-Subject
  3. Creatting Variable Labels to match the newly-restructured file
1.    Changing a Many-Records-Per-Subject data file to a One-Record-Per-Subject (Repeated Measures) (Cases to Variables)

    This is the classic "repeated measures" dilemma.  Say you collected a battery of test data on your subjects once a week, and entered your data so that each record included data from one subject for one week.  This makes data entry and cleaning much easier.   But now you need your data so that each variable only includes data from one week (say to do paired t-tests or regression). 

    Let's say we have a medical study (or surveys) where your subjects come in over time for testing.  ID is the Subject ID, VISNUM is the visit number for that person, VISDT is the Visit Date, AGE is Age as of that visit.  LABA and LABB are any kind of data, say Blood Cholesterol or Weight (run the syntax in What's DATA LIST LIST Do? to create the dataset, or download the entire syntax, Restructure_Data.sps ) 

    Select "Data" and then "Restructure ..." to access the Wizard.  You want to "Restructure Selected Cases Into Variables" so click on that button & then NEXT>.   The "Identifier Variable" is the unique ID for each subject or respondent, such as Medical Record Number, Patient ID, Subject ID, SSN and the like.  The "Index Variable" tells you which data collection set the record is from, such as Time Point, Survey Version or Study Week Number.  The next screen asks if you need to sort the data (tell it YES just to be sure).   You can organize your new dataset by variable (i.e. Q1.1, Q1.2, Q1.3, Q2.1, Q2.2, Q2.3) or by index, which groups all the data, for example, from the first study week, followed by all the data from the next study week   Either way works -- sometimes it's convenient having all the variables in a set, say for using the menus to set up a MANOVA or t-test.  

                Let's restructure our little sample dataset -- we want to break the data apart by SUBJ_ID and VISNUM.  

                             SORT CASES BY subj_id visnum .
                    CASESTOVARS
                     /ID = subj_id
                     /INDEX = visnum
                     /GROUPBY = VARIABLE  
                             /SEPARATOR = "_"                     
                     /COUNT = numcases "Number of cases per subject" .

            That's it!  Very simple code, and runs quickly.  The new variables have a period and a number attached.  I personally don't like the . (confusing in syntax), so in our example I specified a different SEPARATOR   (an underscore _ ) or even no separator (""):
            TIPS:  Use a numeric variable for the Index Variable, and the shorter the better, or you'll end up with variable names longer than 8 characters.  In that case, SPSS substitutes a simple 'V1', 'V2' and so on for the too-long names.  If you have a variable that's already long (7 - 8 characters), you can rename it to a shorter stem by adding a RENAME line to your syntax:

                        /RENAME longname = lngnm varname = rootnm oldname = newnm  ................

         The result is an Untitled dataset, so it doesn't affect your parent file.  SPSS tacks on the new variable name as part of the variable label, which isn't terribly informative, so I still relabel them semi-manually (I use Excel to concatenate information onto old labels, a topic for another tutorial another Friday).  Here's what our practice dataset looks like now:
 
display variables.
 
 
File Information
 
            List of variables on the working file

Name       Pos  Level    Print Fmt     Write Fmt     Missing Values

SUBJ_ID      1  Scale    F3            F3
NUMCASES     2  Ordinal  F4            F8.2
VISDT_1      3  Scale    ADATE11       ADATE11
VISDT_2      4  Scale    ADATE11       ADATE11
VISDT_3      5  Scale    ADATE11       ADATE11
AGE_1        6  Scale    F4.1          F4.1
AGE_2        7  Scale    F4.1          F4.1
AGE_3        8  Scale    F4.1          F4.1
LABA_1       9  Scale    F3.1          F3.1
LABA_2      10  Scale    F3.1          F3.1
LABA_3      11  Scale    F3.1          F3.1
LABB_1      12  Scale    F3.1          F3.1
LABB_2      13  Scale    F3.1          F3.1
LABB_3      14  Scale    F3.1          F3.1


Try the Variables to Cases tutorial, which takes this new dataset and returns it to a One-Case per Subject format.
  1. SPSS AnswerNet # 100006265   A simple worked example, posted 10/22/00
  2. Example showing > 1 variable created:  Thanks to David Matheson at SPSS for help with this.  Here's a sample program ( Restructure_data.sps ) and starting data set (restruct.sav ).
2.    Changing a One-Record-Per-Subject data file to a Many-Record-Per-Subject (Variables to Cases)

    This is the reverse of Cases to Variables .  Say you entered your data with one record per subject, but multiple repeats per variable, and now you need to calculate an ANOVA by TIMEPT for a few variables.  How you tackle this depends on if you have v11 or later:

VARSTOCASES 
 /MAKE visdt "Visit Date" FROM visdt_1 visdt_2 visdt_3
 /MAKE age "Age as of Visit Date" FROM age_1 age_2 age_3
 /MAKE laba "Lab A Value" FROM laba_1 laba_2 laba_3
 /MAKE labb "Lab B Value" FROM labb_1 labb_2 labb_3
 /INDEX = visnum "Visit Number"(3)
 /KEEP =  subj_id numcases
 /NULL = DROP.


    Here's how the command is set up -- for each grouping of variables, you need a /MAKE line:

    /MAKE newvar "Your Variable Label Here" FROM oldvar1 oldvar2 oldvar3

So in our example, we need 4 /MAKE lines, one for each set of variables.  Next comes the /INDEX line -- visnum is part of the unique identifier in combination with subj_id.  The number in parens (3) tells how many variables are to be collapsed together.  /KEEP =   identifies the identifier (ie subject ID) plus any other non-grouped variables you want to keep (like demographics which might appear only once in a record).  /NULL = indicates what to do with records that have no data for the grouped variables; DROP deletes these records, KEEP leaves them in the new dataset.

    Run the command, and here's the result:

display labels.
 
 
File Information
 
            List of variables on the working file

Name     Position  Label

SUBJ_ID         1
NUMCASES        2  Number of cases per subject
VISNUM          3  Visit Number
VISDT           4  Visit Date
AGE             5  Age as of Visit Date
LABA            6  Lab A Value
LABB            7  Lab B Value


 
list all.
 
 
List
 
SUBJ_ID NUMCASES VISNUM       VISDT  AGE LABA LABB

  101        3       1  11/01/1999  23.0   34   33
  101        3       2  01/08/2000  24.0   23   22
  101        3       3  06/15/2001  25.0   34   33
  204        2       1  01/23/2000  46.0   13   24
  204        2       2  11/14/2000  47.0   24   33
  207        1       1  05/08/2000  36.0   28   27


Number of cases read:  6    Number of cases listed:  6


            These two examples assume you only have one data variable, so need adapting if you have more than one (i.e. a "score" collected a several time points).   They use VECTOR and LOOP commands to create the new data set.  Your variables with the data must have a variable name with a number at the end, i.e. DATAA, DATAB, DATAC need to be renamed to DATA1, DATA2, DATA3.

    Check out this worked example at SPSS (Solution ID #100006264) .

    Here's another example (syntax and resulting output included) which takes a file used for an inter-rater reliability problem (calculating an ICC) with ID, JUDGEA, JUDGEB, JUDGEC and converts it to a file with ID, JUDGE, SCORE with one record per judge per score (rather than one record per assessment) (2/15/01):  One_to_Many_Example.sps

3.    Creating Variable Labels to match the newly-restructured file

    This trick goes with #1 above.  Okay, now you've expanded your file so you have oodles of new variables for each original variable.  Where you once had ID, BP, PW, ICD (as in Patient ID, Blood Pressure, Patient Weight, and ICD code) you now have ID, BP1, BP2, BP3, PW1, PW2, PW3, ICD1, etc etc.  But your VARIABLE LABELS command now no longer works (it's looking for BP, not BP1, BP2, BP3 ...) plus the labels themselves do not indicate what time period they are from.

    This syntax with a macro from Raynald Levesque (Variable_relabel.sps ) creates a new VARIABLE LABELS command so that your new labels can indicate the source, i.e. "Blood Pressure - Week 1", "Blood Pressure - Week 2" etc.

[Top of Page] [SPSS Tips] [Research Stuff] [Carol's Home Page]


Data Restructuring - Variables

Short, data restructuring tidbits:  Includes example of converting a multiple-response question entered into a single field (1,3,4) into 4 fields, one for each option (6/22/00): [Top of Page] [SPSS Tips] [Research Stuff] [Carol's Home Page]

Last updated January 2, 2003. Updated sporadically. / calbright@visi.com