Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Register
  • Sign in
  • A AGGREGATION Scripts
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 0
    • Issues 0
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Artifacts
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Container Registry
    • Terraform modules
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • agg
  • AGGREGATION Scripts
  • Wiki
  • Configurations

Configurations · Changes

Page history
Create Configurations Wiki authored May 15, 2023 by Tom LIU's avatar Tom LIU
Show whitespace changes
Inline Side-by-side
Configurations.md 0 → 100644
View page @ 424970c1
## Data Preparation
Please refer to <code>data_prep_config.yml</code> for the following configurations.
### Global Configuration
| Setting| Default|Description|
| ------ | ------ |------ |
| iso_code |(various) | ISO code of the language |
### Corpus Configuration
| Setting| Default|Description|
| ------ | ------ |------ |
| raw_data_file |(path_to_file) | Location of the corpus, please combine multiple files into a single file |
| raw_data_type |toolbox | Filetype of the corpus (supported: toolbox, todo: [flex, elan, odin, pangloss]) |
### Toolbox Corpus Configuration
| Setting| Default|Description|
| ------ | ------ |------ |
| toolbox_transcript_tier|\\tph | Name of transcript tier in Toolbox file |
| toolbox_morpheme_tier|\\mph| Name of morpheme tier in Toolbox file |
| toolbox_gloss_tier|\\mgl| Name of gloss tier in Toolbox file |
| toolbox_pos_tier|\\ps| Name of part of the speech tier in Toolbox file |
| toolbox_translation_tier|\\eng| Name of English translation tier in Toolbox file |
### XIGT Configuration
| Setting| Default|Description|
| ------ | ------ |------ |
| raw_to_xigt | true | Set to true to turn on the raw corpus to XIGT conversion |
| raw_to_xigt_path | (path_to_XIGT_file) | Location of to-save XIGT file, only applicable when raw_to_xigt=true |
|xigt_path| (path_to_XIGT_directory)| Directory where stores all XIGT files for this language|
|xigt_file_path| (path_to_XIGT_file) | Load existing XIGT file, only applicable when raw_to_xigt=false|
### XIGT Enrichment Configuration
| Setting| Default|Description|
| ------ | ------ |------ |
| xigt_to_enriched_xigt| true | Set to true to turn on XIGT enrichment |
| enriched_xigt_file_path | (path_to_enrichedXIGT_file) | Path to store the enriched XIGT file |
|split_enriched_xigt|true| Set to true to split the enriched XIGT into 10 training folds and 10 test folds|
### Testsuites Configuration
| Setting| Default|Description|
| ------ | ------ |------ |
| create_testsuites| true | Set to true to create testsuits from enriched XIGT |
| testsuites_directory_path | (path_to_testsuits_directory)| Directory to store all testsuits|
### AGG Config Preparation
| Setting| Default|Description|
| ------ | ------ |------ |
|prepare_agg_configs|true|Set to true to run AGG config file preparation|
|agg_config_path|(path_to_agg_config_directory)| Directory to store AGG config files|
| collect_tags | true | Set to true to collect POS tags set from enriched XIGT|
|pos_tag_tier_names| "pos m" | POS tag tier names in enriched XIGT|
## AGG Inferences Configuration
Please refer to <code>agg_config.yml</code> for the following configurations.
| Setting| Default|Description|
| ------ | ------ |------ |
|agg_config_path|(path_to_agg_config_directory)| Directory of the AGG config files|
|compression_rounds|1|TODO|
|output_dir|(path_to_output_directory)|Directory to output inferred choices file and skipped items|
|graph|false|TODO|
|hyphens|true|TODO|
|precluster|None|TODO|
|glosses|true|TODO|
|compression|0.2|TODO|
|lexitem_classes|true|TODO|
|all_stems_occur_bare| true|TODO|
|ignore_chars| None|TODO|
|ungrammatical| '*!?'|TODO|
|allomorphs| None|TODO|
|boundaries| true|TODO|
|allow_difference| 5|TODO|
|escape_special_characters| true|TODO|
|wordlist| ''|TODO|
|infer_case| gram|TODO|
## Evaluation Preparation
Please refer to <code>eval_config.yml</code> for the following configurations.
| Setting| Default|Description|
| ------ | ------ |------ |
| TODO||
\ No newline at end of file
Clone repository
  • Configurations
  • Home