Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request]: Top Level files Section To Contain Central Repository For File Backed Datasets #443

Open
TimothyWillard opened this issue Jan 6, 2025 · 1 comment
Labels
config Relating to configuration files or their framework. enhancement Request for improvement or addition of new feature(s). medium priority Medium priority.

Comments

@TimothyWillard
Copy link
Contributor

Label

config, enhancement

Priority Label

medium priority

Is your feature request related to a problem? Please describe.

Files are referenced throughout the current YAML configuration files, typically by a relative file path from the $PROJECT_PATH directory. This makes it challenging to share datasets that are file backed across a configuration problem. Also each file reading wants the format to be done in a specific way, that can be difficult to know ahead of time making general files difficult to use without processing.

Is your feature request related to a new application, scenario round, pathogen? Please describe.

No response

Describe the solution you'd like

A files top level key where each key corresponds to a generic name that can be referenced throughout the configuration and a value which specifies how each file should be read in. A quick example would look like:

files:
  subpops:
    path: model_input/subpop_structure.csv
    subpopulation: state
  r0_trend:
    path: model_input/R0.csv
    datetime: dt
    subpopulation: state
  age_contact:
    path: model_input/age_contact_matrix.txt
    rows: age_source
    columns: age_destination

Still plenty to think through here, to enable this there will need to be a couple of code pieces implemented:

  1. Changes to the configuration to enable this including an intermediate solution where directly specified paths can still be read in as is or using this new approach,
  2. A FileDataset ABC that serves as a representation for the different types of file datasets,
  3. Implementations for the different file dataset types that we would like to support, and
  4. A well documented API that plugins can use to pull a file from a name (not clear what is the best way to do this as we are trying to move away from a global configuration, so maybe have to provide a configuration object?).
@TimothyWillard TimothyWillard added enhancement Request for improvement or addition of new feature(s). config Relating to configuration files or their framework. medium priority Medium priority. labels Jan 6, 2025
@TimothyWillard
Copy link
Contributor Author

In the future could be useful to make a special groundtruth key that serves as input to the likelihood function. To supersede GH-441.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
config Relating to configuration files or their framework. enhancement Request for improvement or addition of new feature(s). medium priority Medium priority.
Projects
None yet
Development

No branches or pull requests

1 participant