
Chinese Given Name Characters and Frequency (1930–2008)
Source:R/data-documentation.R
given_name_df.Rd
This dataset, given_name_df, is a data frame containing 2,614 Chinese characters commonly used in given names, along with nationwide frequency data. The dataset includes 2614 observations and 25 variables, providing information such as stroke count, gender distribution, historical usage, frequency per million, uniqueness, and perceived name traits such as warmth and competence.
Usage
data(given_name_df)
Format
A data frame with 2614 observations and 25 variables:
- character
Chinese character used in given names (character)
- pinyin
Pronunciation in Pinyin (character)
- bihua
Number of strokes in the character (numeric)
- n.male
Number of males with this character in their name (numeric)
- n.female
Number of females with this character in their name (numeric)
- name.gender
Gender index (numeric)
- n.1930_1959
Number of occurrences between 1930–1959 (numeric)
- n.1960_1969
Number of occurrences between 1960–1969 (numeric)
- n.1970_1979
Number of occurrences between 1970–1979 (numeric)
- n.1980_1989
Number of occurrences between 1980–1989 (numeric)
- n.1990_1999
Number of occurrences between 1990–1999 (numeric)
- n.2000_2008
Number of occurrences between 2000–2008 (numeric)
- ppm.1930_1959
Frequency per million (1930–1959) (numeric)
- ppm.1960_1969
Frequency per million (1960–1969) (numeric)
- ppm.1970_1979
Frequency per million (1970–1979) (numeric)
- ppm.1980_1989
Frequency per million (1980–1989) (numeric)
- ppm.1990_1999
Frequency per million (1990–1999) (numeric)
- ppm.2000_2008
Frequency per million (2000–2008) (numeric)
- name.ppm
Overall frequency per million (numeric)
- name.uniqueness
Uniqueness score of the name (numeric)
- corpus.ppm
Frequency in linguistic corpus (numeric)
- corpus.uniqueness
Uniqueness in corpus (numeric)
- name.valence
Emotional valence of the name (numeric)
- name.warmth
Perceived warmth of the name (numeric)
- name.competence
Perceived competence of the name (numeric)
Details
The dataset name has been kept as 'given_name_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the ChinAPIs package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.