Nominal category
Introduction to nominal data
[edit]A variable used to associate each data point in a set of observations, or in a particular instance, to a certain qualitative category is a categorical variable. Categorical variables have two types of scales, ordinal and nominal.[1] The first type of categorical scale is dependent on natural ordering, levels that are defined by a sense of quality. Variables with this ordering convention are known as ordinal variables. In comparison, variables with unordered scales are nominal variables.[1]
A nominal variable, or nominal group, is a group of objects or ideas collectively grouped by a particular qualitative characteristic.[3] Nominal variables do not have a natural order, which means that statistical analyses of these variables will always produce the same results, regardless of the order in which the data is presented.[1][3]
Even though ordinal variable statistical methods cannot be used for nominal groups, nominal group methods can be used for both types of categorical data sets; however, nominally categorizing ordinal data will remove order, limiting further dataset analysis to result in nominal outcomes.[1]
Valid performable operations on nominal data
[edit]Since a nominal group consists of data that is either identified as a member or non-member, each individual data point carries no additional significance beyond group identification. Additionally, data identification justifies whether it is necessary to form new nominal groups based on the information available.[3] Because nominal categories cannot be numerically organized or ranked, members associated with a nominal group cannot be placed in an ordinal or ratio form.
Nominal data is often compared to ordinal and ratio data to determine if individual data points influence the behavior of quantitatively driven datasets.[1] [4] For example, the effect of race (nominal) on income (ratio) could be investigated by regressing the level of income upon one or more dummy variables that specify race. When nominal variables are used in these contexts, the valid data operations that may be performed are limited. While arithmetic operations and calculations measuring the central tendency of data (quantitative assignments of data analysis, including mean, median) cannot be performed on nominal categories, performable data operations include the comparison of frequencies and the frequency distribution, the determination of a mode, the creation of pivot tables, and uses of Chi-square goodness of fit and independence tests, coding and recoding, and logistic or probit regressions.[1][3][4]
Examples and logical analysis of nominal data
[edit]As ‘nominal’ suggests, nominal groups are based on the name of the data it encapsulates.[3] For example, citizenship is a nominal group. A person can either be a citizen of a country or not. With this, a citizen of Canada does not have “more citizenship” than another citizen of Canada; therefore, it is impossible to order citizenship by any mathematical logic.
Another example of name categorization would be identifying "words that start with the letter 'a'". There are thousands of words that start with the letter 'a' but none have "more" of this nominal quality than others, meaning that the word starting with the letter ‘a’ is more important than determining the number of ‘a’s as the first letters of an instance because this is associated with membership rather than quantifying the data as an ordinal group.
With this, the correlation of two nominal categories is difficult because some relationships that occur are spurious, where two or more variables are incorrectly assumed to correlate with one another. Data compared within categories may also be unimportant. For example, figuring out whether proportionally more Canadians have first names starting with the letter 'a' than non-Canadians would be a fairly arbitrary, random exercise. However, the use of comparing nominal data with a frequency distribution to associate gender and political affiliation would be more effective since a correlation between the counts of a particular party affiliation would compare to the number of male and or female voters accounted in a dataset.
From a quantitative analysis perspective, one of the most common operations to perform on nominal data is dummy variable assignment, a method earlier introduced. For example, if a nominal variable has three categories (A, B, and C), two dummy variables would be created (for A and B) where C is the reference category, the nominal variable that serves as a baseline for variable comparison.[6] Another example of this is the use of indicator variable coding that assigns a numerical value of 0 or 1 to each data point in a set. This method identifies whether individual observations belong to a particular group (set to one) or not (set to zero).[6] This numerical association allows for more flexibility in nominal data analysis as it captures differences not only between distinct nominal groups, but also the differences present among data within a set, determining the interactions between nominal variables and other variables in a systematic context.[6]
References
[edit]- ^ a b c d e f Agresti, Alan (2007). An Introduction to categorical data analysis. Wiley series in probability and statistics (2nd ed.). Hoboken (N.J.): Wiley-Interscience. ISBN 978-0-471-22618-5.
- ^ Dahouda, Mwamba Kasongo; Joe, Inwhee (2021). "A Deep-Learned Embedding Technique for Categorical Features Encoding". IEEE Access. 9: 114381–114391. Bibcode:2021IEEEA...9k4381D. doi:10.1109/ACCESS.2021.3104357. ISSN 2169-3536.
- ^ a b c d e Rugg, Gordon; Petre, Marian (2006), A Gentle Guide To Research Methods, McGraw-Hill International, ISBN 9780335219278.
- ^ a b T.Reynolds, H. (1984). Analysis of Nominal Data. SAGE Publications, Inc. doi:10.4135/9781412983303. ISBN 978-1-4129-8330-3.
- ^ Reid, Howard M. (2014). Introduction to statistics: fundamental concepts and procedures of data analysis. Los Angeles: SAGE. ISBN 978-1-4522-7196-5.
- ^ a b c Ryan, Thomas P. (2009). Solutions manual to accompany modern regression methods. Wiley series in probability and statistics (2nd ed.). Hoboken, N.J: Wiley. ISBN 978-0-470-08186-0.