Source code for dffml.source.dataset.base

from ..wrapper import (
    context_managed_wrapper_source,
    ContextManagedWrapperSource,
)


[docs]def dataset_source(entrypoint_name) -> ContextManagedWrapperSource: r""" Allows us to quickly programmatically provide access to existing datasets via existing or custom sources. Under the hood this is an alias for :py:func:`dffml.source.wrapper.context_managed_wrapper_source` with ``qualname_suffix`` set to "DatasetSource". Examples -------- Say we have the following dataset hosted at http://example.com/my_training.csv Let's test this locally by creating a file and serving it from our local machine. Write the following file. **my_training.csv** .. code-block:: :test: :filepath: my_training.csv feed,face,dead,beef 0.0,0,0,0 0.1,1,10,100 0.2,2,20,200 0.3,3,30,300 0.4,4,40,400 We can start an HTTP server using Python .. code-block:: console :test: :daemon: 8080 $ python3 -m http.server 8080 We could write a dataset source to download and cache the contents locally as follows. We want to make sure that we validate the contents of datasets using SHA 384 hashes (see :py:func:`cached_download <dffml.util.net.cached_download>` for more details). Without hash validation we risk downloading the wrong file or potentially malicious files. **my_training.py** .. literalinclude:: /../examples/source/dataset/base/dataset_source/my_training.py :test: We can use it from Python in two different ways as follows **run.py** .. literalinclude:: /../examples/source/dataset/base/dataset_source/my_training_run.py :test: :filepath: run.py .. code-block:: console :test: :replace: cmds[0][-2] = cmds[0][-2].replace("8080", str(ctx["HTTP_SERVER"]["8080"])) $ python3 run.py http://localhost:8080/my_training.csv cache_dir Or we can use it from the command line .. code-block:: console :test: :replace: cmds[0][-1] = cmds[0][-1].replace("8080", str(ctx["HTTP_SERVER"]["8080"])) $ dffml list records \ -sources training=my_training:my_training_dataset.source \ -source-training-cache_dir cache_dir \ -source-training-url http://localhost:8080/my_training.csv """ return context_managed_wrapper_source( entrypoint_name, qualname_suffix="DatasetSource" )