Skip to contents

This dataset, top1000name_prov_df, is a data frame containing the 1,000 most common given names across 31 provinces in mainland China. The dataset includes 999 observations and 35 variables, reporting name counts by gender and by individual province. This dataset enables geographic comparisons of name popularity and sociocultural naming trends across Chinese regions.

Usage

data(top1000name_prov_df)

Format

A data frame with 999 observations and 35 variables:

name

Given name (character)

n.male

Number of males with this name (numeric)

n.female

Number of females with this name (numeric)

beijing

Name frequency in Beijing (numeric)

tianjin

Name frequency in Tianjin (numeric)

hebei

Name frequency in Hebei (numeric)

shanxi

Name frequency in Shanxi (numeric)

neimenggu

Name frequency in Inner Mongolia (numeric)

liaoning

Name frequency in Liaoning (numeric)

jilin

Name frequency in Jilin (numeric)

heilongjiang

Name frequency in Heilongjiang (numeric)

shanghai

Name frequency in Shanghai (numeric)

jiangsu

Name frequency in Jiangsu (numeric)

zhejiang

Name frequency in Zhejiang (numeric)

anhui

Name frequency in Anhui (numeric)

fujian

Name frequency in Fujian (numeric)

jiangxi

Name frequency in Jiangxi (numeric)

shandong

Name frequency in Shandong (numeric)

henan

Name frequency in Henan (numeric)

hubei

Name frequency in Hubei (numeric)

hunan

Name frequency in Hunan (numeric)

guangdong

Name frequency in Guangdong (numeric)

guangxi

Name frequency in Guangxi (numeric)

hainan

Name frequency in Hainan (numeric)

chongqing

Name frequency in Chongqing (numeric)

sichuan

Name frequency in Sichuan (numeric)

guizhou

Name frequency in Guizhou (numeric)

yunnan

Name frequency in Yunnan (numeric)

xizang

Name frequency in Tibet (numeric)

shaanxi

Name frequency in Shaanxi (numeric)

gansu

Name frequency in Gansu (numeric)

qinghai

Name frequency in Qinghai (numeric)

ningxia

Name frequency in Ningxia (numeric)

xinjiang

Name frequency in Xinjiang (numeric)

others

Name frequency in unspecified or other regions (numeric)

Source

Data taken from the ChineseNames package version 2023.8

Details

The dataset name has been kept as 'top1000name_prov_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the ChinAPIs package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.