
Chinese Surnames and National Frequency (1930–2008)
Source:R/data-documentation.R
family_name_df.Rd
This dataset, family_name_df, is a data frame containing 1,806 Chinese surnames along with their frequency and distribution across China. The dataset includes 1806 observations and 7 variables, covering information such as whether a surname is compound, its initial, frequency ranks, and relative frequency between 1930 and 2008. This dataset is useful for sociolinguistic analysis, demography, and historical population studies.
Usage
data(family_name_df)
Format
A data frame with 1806 observations and 7 variables:
- surname
Chinese surname (character)
- compound
Indicates if the surname is compound (numeric)
- initial
Initial letter of surname in Pinyin (character)
- initial.rank
Rank of the initial letter (numeric)
- n.1930_2008
Estimated number of people with the surname (1930–2008) (numeric)
- ppm.1930_2008
Relative frequency per million (1930–2008) (numeric)
- surname.uniqueness
Surname uniqueness score (numeric)
Details
The dataset name has been kept as 'family_name_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the ChinAPIs package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.