Skip to contents

This dataset, family_name_df, is a data frame containing 1,806 Chinese surnames along with their frequency and distribution across China. The dataset includes 1806 observations and 7 variables, covering information such as whether a surname is compound, its initial, frequency ranks, and relative frequency between 1930 and 2008. This dataset is useful for sociolinguistic analysis, demography, and historical population studies.

Usage

data(family_name_df)

Format

A data frame with 1806 observations and 7 variables:

surname

Chinese surname (character)

compound

Indicates if the surname is compound (numeric)

initial

Initial letter of surname in Pinyin (character)

initial.rank

Rank of the initial letter (numeric)

n.1930_2008

Estimated number of people with the surname (1930–2008) (numeric)

ppm.1930_2008

Relative frequency per million (1930–2008) (numeric)

surname.uniqueness

Surname uniqueness score (numeric)

Source

Data taken from the ChineseNames package version 2023.8

Details

The dataset name has been kept as 'family_name_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the ChinAPIs package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.