
Comparative Vocabulary for Colombia's Indigenous Languages
Source:R/data-documentation.R
indigenous_vocabulary_df.Rd
This dataset, indigenous_vocabulary_df, is a data frame containing a comparative vocabulary (a "wordlist") for 69 indigenous languages of Colombia, originally compiled by Huber & Reed (1992). The dataset provides lexical correspondences across multiple languages, supporting linguistic, anthropological, and historical research.
Usage
data(indigenous_vocabulary_df)
Format
A data frame with 27,521 observations and 4 variables:
- CONCEPT
Gloss or concept represented in the wordlist (factor with 366 levels)
- COUNTERPART
Word corresponding to the concept in the given language (character)
- DOCULECT
Name of the language or variety (factor with 71 levels)
- TOKENS
Tokenized form of the counterpart (character)
Details
The dataset name has been kept as 'indigenous_vocabulary_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the ColombiAPI package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame object. The original content has not been modified in any way.