SPSS Tips & Tricks
If you haven't already,
I highly recommend signing up for the SPSS listserv (e-mail discussion group)
-- it's an outstanding source of help and vicarious learning, both about using
SPSS and general statistics advice
(click here for sign-up instructions)
.
These pages aim to include tutorial information along with
the syntax pointers, and are updated sporadically. For THE site on everything
SPSS, including syntax help, macros & the rest, check out
Raynald Levesque's archive
; he also posts frequently to the list.
Hint -- to download files (like the data sets), hold down
the SHIFT key before clicking on the file name. Your browser should
then ask where you want to save the file.
If you have ideas to add to
these pages, please drop me (Carol Albright) a note:
calbright@visi.com
Contents
What's DATA LIST LIST Do?
If you subscribe (you DO don't you?) to the SPSS Listserv,
you'll notice many examples of solutions start out with a set of commands
somewhat like the following:
DATA LIST LIST /subj_id (f3) visnum (f2) visdt (ADATE11) age (f3.1)
laba (F2.1) labb (F2.1).
BEGIN DATA.
101 1 11/01/1999 23 34.1 33.0
101 2 01/08/2000 24 23.3 22.1
101 3 06/15/2001 25 34.1 33.0
204 1 01/23/2000 46 13.3 24.1
204 2 11/14/2000 47 24.1 33.0
207 1 05/08/2000 36 28.3 27.1
END DATA.
Rather than typing data into a dataset using Access,
Excel or SPSS, sometimes data arrives as a simple text file with each variable
separated into different columns. The DATA LIST command tells SPSS how
to read such files. For our purposes, it's an easy way for readers to
setup a sample dataset that the remainder of the posting or tutorial uses.
The DATA LIST LIST command tells SPSS that the data are
in a simple list, with one record per row and only a space separating variables.
The information following the slash (/) includes the names and formats
of the variables. For example, our first variable, SUBJ_ID is a regular
number, a Fixed with a width of 3. VISNUM is also a number, but with
a width of up to 2 digits. VISTDT is an ADATE11, a date field with width
11. (Check the online SPSS help for all the different formats available
(search on Format).)
Everything between the BEGIN DATA. and the END DATA.
commands are our data. Open a syntax window in SPSS, copy and
paste the block of commands. Highlight all of the lines (ie from DATA
LIST thru END DATA) and run the commands (CTRL-R is one shortcut). You
should now have a new, Untitled dataset in your data window.
Here's what the data look like:
List
SUBJ_ID VISNUM VISDT AGE LABA
LABB
101 1 11/01/1999
23.0 34 33
101 2
01/08/2000 24.0 23 22
101 3 06/15/2001
25.0 34 33
204 1 01/23/2000
46.0 13 24
204 2 11/14/2000
47.0 24 33
207 1 05/08/2000
36.0 28 27
Number of cases read: 6 Number of cases listed:
6
Now you're set to do the tutorials in the
Cases to Variables
and Variables to Cases
sections.
Data Restructuring - File
- Changing a Many-Records-Per-Subject data file
to a One-Record-Per-Subject (Repeated Measures)
- Changing a One-Record-Per-Subject data file
to a Many-Record-Per-Subject
- Creatting Variable Labels to match the newly-restructured
file
1. Changing a Many-Records-Per-Subject data file
to a One-Record-Per-Subject (Repeated Measures) (Cases to Variables)
This is the classic "repeated measures" dilemma.
Say you collected a battery of test data on your subjects once a week, and
entered your data so that each record included data from one subject for
one week. This makes data entry and cleaning much easier.
But now you need your data so that each variable only includes data from
one week (say to do paired t-tests or regression).
Let's say we have a medical study (or surveys) where
your subjects come in over time for testing. ID is the Subject ID, VISNUM
is the visit number for that person, VISDT is the Visit Date, AGE is Age
as of that visit. LABA and LABB are any kind of data, say Blood Cholesterol
or Weight (run the syntax in What's DATA LIST LIST
Do?
to create the dataset, or download the entire syntax,
Restructure_Data.sps
)
- Restructure Data Wizard (7/12/02, 1/2/03): This is
cool -- SPSS 11 now has new syntax -- CASESTOVARS -- which you can create
via a menu-driven wizard to restructure your file. Try it! Use
the sample dataset from above What's DATA LIST LIST
Do?
or open up a smallish (50 variables or so) dataset to play with
Select "Data" and then "Restructure ..." to access the
Wizard. You want to "Restructure Selected Cases Into Variables" so click
on that button & then NEXT>. The "Identifier Variable" is the
unique ID for each subject or respondent, such as Medical Record Number, Patient
ID, Subject ID, SSN and the like. The "Index Variable" tells you which
data collection set the record is from, such as Time Point, Survey Version
or Study Week Number. The next screen asks if you need to sort the
data (tell it YES just to be sure). You can organize your new dataset
by variable (i.e. Q1.1, Q1.2, Q1.3, Q2.1, Q2.2, Q2.3) or by index, which
groups all the data, for example, from the first study week, followed by
all the data from the next study week Either way works -- sometimes
it's convenient having all the variables in a set, say for using the menus
to set up a MANOVA or t-test.
Let's restructure our little sample dataset -- we want to break the data
apart by SUBJ_ID and VISNUM.
SORT CASES
BY subj_id visnum .
CASESTOVARS
/ID = subj_id
/INDEX = visnum
/GROUPBY = VARIABLE
/SEPARATOR =
"_"
/COUNT = numcases "Number of cases per subject" .
That's
it! Very simple code, and runs quickly. The new variables have
a period and a number attached. I personally don't like the . (confusing
in syntax), so in our example I specified a different SEPARATOR
(an underscore _ ) or even no separator (""):
TIPS:
Use a numeric variable for the Index Variable, and the shorter the
better, or you'll end up with variable names longer than 8 characters. In
that case, SPSS substitutes a simple 'V1', 'V2' and so on for the too-long
names. If you have a variable that's already long (7 - 8 characters),
you can rename it to a shorter stem by adding a RENAME line to your syntax:
/RENAME longname = lngnm varname
= rootnm oldname = newnm ................
The result is an
Untitled dataset, so it doesn't affect your parent file. SPSS
tacks on the new variable name as part of the variable label, which isn't
terribly informative, so I still relabel them semi-manually (I use Excel
to concatenate information onto old labels, a topic for another tutorial another
Friday). Here's what our practice dataset looks like now:
display variables.
File Information
List
of variables on the working file
Name Pos Level
Print Fmt Write Fmt Missing
Values
SUBJ_ID 1 Scale F3
F3
NUMCASES 2 Ordinal F4
F8.2
VISDT_1 3 Scale ADATE11
ADATE11
VISDT_2 4 Scale ADATE11
ADATE11
VISDT_3 5 Scale ADATE11
ADATE11
AGE_1 6 Scale
F4.1 F4.1
AGE_2 7 Scale
F4.1 F4.1
AGE_3 8 Scale
F4.1 F4.1
LABA_1 9 Scale
F3.1 F3.1
LABA_2 10 Scale F3.1
F3.1
LABA_3 11 Scale F3.1
F3.1
LABB_1 12 Scale F3.1
F3.1
LABB_2 13 Scale F3.1
F3.1
LABB_3 14 Scale F3.1
F3.1
Try the Variables to Cases
tutorial, which takes this new dataset and returns it to a One-Case per
Subject format.
- Vector Mania (12/1/00): If you don't have 11,
this is really slick! Create a record number (say you forgot to enter
what week of treatment the data are from, or only entered the date data were
collected) using the LAG command. Use the VECTOR command to painlessly
create multiple variables named by the 'record number' (i.e. week):
-
SPSS AnswerNet # 100006265
A simple worked example, posted 10/22/00
- Example showing > 1 variable created: Thanks to David
Matheson at SPSS for help with this. Here's a sample program (
Restructure_data.sps
) and starting data set (restruct.sav
).
- The Brute Force Technique (6/21/00): When all else
fails, break your file into multiple data sets, one per week (or whatever
your break criteria is). Rename the variables so that they identify
the source (ie change BP to BP_1, BP_2, BP_#, where # identifies the week
the data were collected). Then join the files together by the case ID
(ie patient or study ID). Once you've set up the first set of syntax,
the remaining pieces are creating mostly by cutting & pasting with a bit
of search & replace.
2. Changing a One-Record-Per-Subject data
file to a Many-Record-Per-Subject (Variables to Cases)
This is the reverse of Cases
to Variables
. Say you entered your data with one record per subject, but multiple
repeats per variable, and now you need to calculate an ANOVA by TIMEPT for
a few variables. How you tackle this depends on if you have v11 or later:
- Using The VARSTOCASES Command (1/2/03): If you
have SPSS v11 or later, use the VARSTOCASES command. You can use Restructure
Wizard to set up the command (with the Data window on top, click on Data ...,
Restructure ..., Variables to Cases ... from the menus) or even better, take
a copy of this sample syntax and adapt it to your needs. (The VARSTOCASES
wizard I found to be pokey to use). Using our play dataset again after
running CASESTOVARS, we now return it to its original format with VARSTOCASES:
VARSTOCASES
/MAKE visdt "Visit Date" FROM visdt_1 visdt_2 visdt_3
/MAKE age "Age as of Visit Date" FROM age_1 age_2 age_3
/MAKE laba "Lab A Value" FROM laba_1 laba_2 laba_3
/MAKE labb "Lab B Value" FROM labb_1 labb_2 labb_3
/INDEX = visnum "Visit Number"(3)
/KEEP = subj_id numcases
/NULL = DROP.
Here's how the command is set up -- for each grouping
of variables, you need a /MAKE line:
/MAKE newvar "Your Variable Label Here" FROM
oldvar1 oldvar2 oldvar3
So in our example, we need 4 /MAKE lines, one for each set of variables.
Next comes the /INDEX line -- visnum is part of the unique
identifier in combination with subj_id. The number in parens (3) tells
how many variables are to be collapsed together. /KEEP = identifies
the identifier (ie subject ID) plus any other non-grouped variables you want
to keep (like demographics which might appear only once in a record). /NULL
= indicates what to do with records that have no data for the grouped variables;
DROP deletes these records, KEEP leaves them in the new dataset.
Run the command, and here's the result:
display labels.
File Information
List
of variables on the working file
Name Position Label
SUBJ_ID 1
NUMCASES 2 Number of cases
per subject
VISNUM 3 Visit
Number
VISDT 4
Visit Date
AGE
5 Age as of Visit Date
LABA 6
Lab A Value
LABB 7
Lab B Value
list all.
List
SUBJ_ID NUMCASES VISNUM VISDT
AGE LABA LABB
101 3
1 11/01/1999 23.0 34 33
101 3
2 01/08/2000 24.0 23 22
101 3
3 06/15/2001 25.0 34 33
204 2
1 01/23/2000 46.0 13 24
204 2
2 11/14/2000 47.0 24 33
207 1
1 05/08/2000 36.0 28 27
Number of cases read: 6 Number of cases listed:
6
- Restructuring Prior to version 11:
These
two examples assume you only have one data variable, so need adapting if
you have more than one (i.e. a "score" collected a several time points).
They use VECTOR and LOOP commands to create the new data set. Your
variables with the data must have a variable name with a number at the end,
i.e. DATAA, DATAB, DATAC need to be renamed to DATA1, DATA2, DATA3.
Check out this worked example at SPSS
(Solution ID #100006264)
.
Here's another example (syntax and resulting
output included) which takes a file used for an inter-rater reliability
problem (calculating an ICC) with ID, JUDGEA, JUDGEB, JUDGEC and converts
it to a file with ID, JUDGE, SCORE with one record per judge per score (rather
than one record per assessment) (2/15/01):
One_to_Many_Example.sps
3. Creating Variable Labels to match the
newly-restructured file
This trick goes with #1 above. Okay,
now you've expanded your file so you have oodles of new variables for each
original variable. Where you once had ID, BP, PW, ICD (as in Patient
ID, Blood Pressure, Patient Weight, and ICD code) you now have ID, BP1, BP2,
BP3, PW1, PW2, PW3, ICD1, etc etc. But your VARIABLE LABELS command
now no longer works (it's looking for BP, not BP1, BP2, BP3 ...) plus the
labels themselves do not indicate what time period they are from.
This syntax with a macro from Raynald Levesque
(Variable_relabel.sps
) creates a new VARIABLE LABELS command so that your new labels can indicate
the source, i.e. "Blood Pressure - Week 1", "Blood Pressure - Week 2" etc.
[Top of Page]
[SPSS Tips]
[Research Stuff]
[Carol's Home Page]
Data Restructuring - Variables
Short, data restructuring tidbits: Includes example of
converting a multiple-response question entered into a single field (1,3,4)
into 4 fields, one for each option (6/22/00):
[Top of Page]
[SPSS Tips]
[Research Stuff]
[Carol's Home Page]
Last updated January 2, 2003. Updated sporadically.
/ calbright@visi.com