Skip to contents

This dataset, given_name_df, is a data frame containing 2,614 Chinese characters commonly used in given names, along with nationwide frequency data. The dataset includes 2614 observations and 25 variables, providing information such as stroke count, gender distribution, historical usage, frequency per million, uniqueness, and perceived name traits such as warmth and competence.

Usage

data(given_name_df)

Format

A data frame with 2614 observations and 25 variables:

character

Chinese character used in given names (character)

pinyin

Pronunciation in Pinyin (character)

bihua

Number of strokes in the character (numeric)

n.male

Number of males with this character in their name (numeric)

n.female

Number of females with this character in their name (numeric)

name.gender

Gender index (numeric)

n.1930_1959

Number of occurrences between 1930–1959 (numeric)

n.1960_1969

Number of occurrences between 1960–1969 (numeric)

n.1970_1979

Number of occurrences between 1970–1979 (numeric)

n.1980_1989

Number of occurrences between 1980–1989 (numeric)

n.1990_1999

Number of occurrences between 1990–1999 (numeric)

n.2000_2008

Number of occurrences between 2000–2008 (numeric)

ppm.1930_1959

Frequency per million (1930–1959) (numeric)

ppm.1960_1969

Frequency per million (1960–1969) (numeric)

ppm.1970_1979

Frequency per million (1970–1979) (numeric)

ppm.1980_1989

Frequency per million (1980–1989) (numeric)

ppm.1990_1999

Frequency per million (1990–1999) (numeric)

ppm.2000_2008

Frequency per million (2000–2008) (numeric)

name.ppm

Overall frequency per million (numeric)

name.uniqueness

Uniqueness score of the name (numeric)

corpus.ppm

Frequency in linguistic corpus (numeric)

corpus.uniqueness

Uniqueness in corpus (numeric)

name.valence

Emotional valence of the name (numeric)

name.warmth

Perceived warmth of the name (numeric)

name.competence

Perceived competence of the name (numeric)

Source

Data taken from the ChineseNames package version 2023.8

Details

The dataset name has been kept as 'given_name_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the ChinAPIs package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.