How to Understand Text Labels
You may have seen text labels mentioned in an analysis warning such as this one:
Or you may have encountered unexpected results in an analysis or graph such as seen here:
In
STATISTICA
, text data can be stored either as text or with text labels. When text data is entered in a spreadsheet, each unique text string can be assigned a numeric code. So the data are stored both as a number, which is hidden, and the text we see. In this article, we will discuss what text labels are and some of the benefits and common questions associated with text data and the use of text labels.
What Are Text Labels?
Create a
new spreadsheet
and enter some text in the first cell. When you press ENTER, you are prompted to designate how this text should be treated. The last two options are to
Enable Text Labels
and
Convert to a text only variable
.
Select the
Enable Text Labels
option button, and click
OK
.
Select the
Data
tab, and in the
Variables
group, click
Text Labels
to display the
Text Labels Editor
for variable 1. Here we can view the numeric associations for the text entered.
In the global options of
STATISTICA
(
Options
dialog box,
Navigation/Defaults
tab), you can
customize
the start point for numeric associations for text labels. By default, 101 is the start point. So, while in the spreadsheet we see the text
Apple
, this cell is also associated with the number 101.
Benefits
Some of the benefits to this text-to-number association are:
Ordinal data can be represented by numbers that show their order as well as text values that have more meaning. For example, enter high, medium, and low stored as 3, 2, 1 respectively. Now their natural order is preserved, but also the more descriptive text is present, too. The variable with text labels can be analyzed either as categorical or continuous.
Easy data entry. The numeric associations can easily be modified and become a shortcut in data entry, i.e., when typing in the data, I can type in 1 and that value will automatically show
Low
from my text labels.
Common Questions Associated with Text Labels
Following are answers to common questions from
STATISTICA
users when text labels are employed in their data.
When a variable is selected for analysis as a continuous variable (in basic descriptive statistics for example) and that variable has text labels, the following warning dialog box is displayed.
This does not mean the analysis can’t proceed. It simply brings to your attention the fact that the analysis you are about to perform may be suspect. Consider the previous example where 1 to 3 represent low to high. We can compute a mean, standard deviation, etc., on this data because the numbers 1 to 3 are used in the mathematical formulas. This warning dialog box prompts you to examine if this analysis makes sense with the data you have selected. If so, select the option to continue. If not, you can further explore the variables containing text labels with the
Scan Spreadsheet
option.
In numeric data, suppose you inadvertently typed in some text, or on import, perhaps the row of variable names were incorrectly read in as the first row of data. Now, a text label and number combination is used in this column. Deleting the offending case is only one step in fixing this issue. The text label, although not used, is still there. This will cause the warning dialog above to be displayed in analysis. The software does not know that the text label was a mistake. Using the
Text Label Editor
, the unwanted text label can be removed.
Another potential problem stemming from accidental text labels is unexpected text popping up in your numeric data. Because of a data entry or import error, a number is assigned a text label. Now, when that number naturally occurs in the data, the number is hidden by the unwanted text label. The root cause of the issue and the fix are the same, but the symptoms are different.
One final possible symptom is unexpected values in graphs and analyses.
This plot shows what happens when numeric data, on a scale of 0 to 1, are plotted in a histogram, but one case has an unexpected text label. The numeric value associated with the text in this graph is the default 101. The data look skewed, as a very extreme outlier is present. This is simply a data entry error that is masked by text labels. Using the
Text Label Editor
, you can further explore this error.
Conclusion
When properly understood and used correctly, text labels are a good tool for data storage. This understanding of the way text labels work can help all
STATISTICA
users to improve their data integrity.
