Operations
Operations Implementations are subclasses of
dffml.df.base.OperationImplementation
, they are functions or classes
which could do anything, make HTTP requests, do inference, etc.
They don’t necessarily have to be written in Python. Although DFFML isn’t quite to the point where it can use operations written in other languages yet, it’s on the roadmap.
dffml
pip install dffml
AcceptUserInput
Official
Accept input from stdin using python input()
Returns
- dict
A dictionary containing user input.
Examples
The following example shows how to use AcceptUserInput. (Assumes that the input from stdio is “Data flow is awesome”!)
>>> import asyncio
>>> from dffml import *
>>>
>>> dataflow = DataFlow.auto(AcceptUserInput, GetSingle)
>>> dataflow.seed.append(
... Input(
... value=[AcceptUserInput.op.outputs["InputData"].name],
... definition=GetSingle.op.inputs["spec"],
... )
... )
>>>
>>> async def main():
... async for ctx, results in MemoryOrchestrator.run(dataflow, {"input": []}):
... print(results)
>>>
>>> asyncio.run(main())
Enter the value: {'UserInput': 'Data flow is awesome'}
Stage: processing
Outputs
InputData: UserInput(type: str)
associate
Official
No description
Stage: output
Inputs
spec: associate_spec(type: List[str])
Outputs
output: associate_output(type: Dict[str, Any])
associate_definition
Official
Examples
>>> import asyncio
>>> from dffml import *
>>>
>>> feed_def = Definition(name="feed", primitive="string")
>>> dead_def = Definition(name="dead", primitive="string")
>>> output = Definition(name="output", primitive="string")
>>>
>>> feed_input = Input(value="my favorite value", definition=feed_def)
>>> face_input = Input(
... value="face", definition=output, parents=[feed_input]
... )
>>>
>>> dead_input = Input(
... value="my second favorite value", definition=dead_def
... )
>>> beef_input = Input(
... value="beef", definition=output, parents=[dead_input]
... )
>>>
>>> async def main():
... for value in ["feed", "dead"]:
... async for ctx, results in MemoryOrchestrator.run(
... DataFlow.auto(AssociateDefinition),
... [
... feed_input,
... face_input,
... dead_input,
... beef_input,
... Input(
... value={value: "output"},
... definition=AssociateDefinition.op.inputs["spec"],
... ),
... ],
... ):
... print(results)
>>>
>>> asyncio.run(main())
{'feed': 'face'}
{'dead': 'beef'}
Stage: output
Inputs
spec: associate_spec(type: List[str])
Outputs
output: associate_output(type: Dict[str, Any])
bz2_compress
Official
No description
Stage: processing
Inputs
input_file_path: decompressed_bz2_file_path(type: str)
output_file_path: compressed_bz2_file_path(type: str)
Outputs
output_path: compressed_output_bz2_file_path(type: str)
bz2_decompress
Official
No description
Stage: processing
Inputs
input_file_path: compressed_bz2_file_path(type: str)
output_file_path: decompressed_bz2_file_path(type: str)
Outputs
output_path: decompressed_output_bz2_file_path(type: str)
convert_list_to_records
Official
No description
Stage: processing
Inputs
matrix: matrix(type: List[List[Any]])
features: features(type: List[str])
keys: keys(type: List[str])
predict_features: predict_features(type: List[str])
unprocessed_matrix: unprocessed_matrix(type: List[List[Any]])
Outputs
records: records(type: Dict[str, Any])
convert_records_to_list
Official
No description
Stage: processing
Inputs
features: features(type: List[str])
predict_features: predict_features(type: List[str])
Outputs
matrix: matrix(type: List[List[Any]])
keys: keys(type: List[str])
unprocessed_matrix: unprocessed_matrix(type: List[List[Any]])
Args
source: Entrypoint
db_query_create_table
Official
Generates a create table query in the database.
Parameters
- table_namestr
The name of the table to be created.
- colslist[str]
Columns of the table.
Examples
>>> import asyncio
>>> from dffml import *
>>>
>>> sdb = SqliteDatabase(SqliteDatabaseConfig(filename="examples.db"))
>>>
>>> dataflow = DataFlow(
... operations={"db_query_create": db_query_create_table.op,},
... configs={"db_query_create": DatabaseQueryConfig(database=sdb),},
... seed=[],
... )
>>>
>>> inputs = [
... Input(
... value="myTable1",
... definition=db_query_create_table.op.inputs["table_name"],
... ),
... Input(
... value={
... "key": "real",
... "firstName": "text",
... "lastName": "text",
... "age": "real",
... },
... definition=db_query_create_table.op.inputs["cols"],
... ),
... ]
>>>
>>> async def main():
... async for ctx, result in MemoryOrchestrator.run(dataflow, inputs):
... pass
>>>
>>> asyncio.run(main())
Stage: processing
Inputs
table_name: query_table(type: str)
cols: query_cols(type: Dict[str, str])
Args
database: Entrypoint
db_query_insert
Official
Generates an insert query in the database.
Parameters
- table_namestr
The name of the table to insert data in to.
- datadict
Data to be inserted into the table.
Examples
>>> import asyncio
>>> from dffml import *
>>>
>>> sdb = SqliteDatabase(SqliteDatabaseConfig(filename="examples.db"))
>>>
>>> dataflow = DataFlow(
... operations={
... "db_query_insert": db_query_insert.op,
... "db_query_lookup": db_query_lookup.op,
... "get_single": GetSingle.imp.op,
... },
... configs={
... "db_query_lookup": DatabaseQueryConfig(database=sdb),
... "db_query_insert": DatabaseQueryConfig(database=sdb),
... },
... seed=[],
... )
>>>
>>> inputs = {
... "insert": [
... Input(
... value="myTable", definition=db_query_insert.op.inputs["table_name"],
... ),
... Input(
... value={"key": 10, "firstName": "John", "lastName": "Doe", "age": 16},
... definition=db_query_insert.op.inputs["data"],
... ),
... ],
... "lookup": [
... Input(
... value="myTable", definition=db_query_lookup.op.inputs["table_name"],
... ),
... Input(
... value=["firstName", "lastName", "age"],
... definition=db_query_lookup.op.inputs["cols"],
... ),
... Input(value=[], definition=db_query_lookup.op.inputs["conditions"],),
... Input(
... value=[db_query_lookup.op.outputs["lookups"].name],
... definition=GetSingle.op.inputs["spec"],
... ),
... ]
... }
>>>
>>> async def main():
... async for ctx, result in MemoryOrchestrator.run(dataflow, inputs):
... if result:
... print(result)
>>>
>>> asyncio.run(main())
{'query_lookups': [{'firstName': 'John', 'lastName': 'Doe', 'age': 16}]}
Stage: processing
Inputs
table_name: query_table(type: str)
data: query_data(type: Dict[str, Any])
Args
database: Entrypoint
db_query_insert_or_update
Official
Automatically uses the better suited operation, insert query or update query.
Parameters
- table_namestr
The name of the table to insert data in to.
- datadict
Data to be inserted or updated into the table.
Examples
>>> import asyncio
>>> from dffml import *
>>>
>>> sdb = SqliteDatabase(SqliteDatabaseConfig(filename="examples.db"))
>>>
>>> person = {"key": 11, "firstName": "John", "lastName": "Wick", "age": 38}
>>>
>>> dataflow = DataFlow(
... operations={
... "db_query_insert_or_update": db_query_insert_or_update.op,
... "db_query_lookup": db_query_lookup.op,
... "get_single": GetSingle.imp.op,
... },
... configs={
... "db_query_insert_or_update": DatabaseQueryConfig(database=sdb),
... "db_query_lookup": DatabaseQueryConfig(database=sdb),
... },
... seed=[],
... )
>>>
>>> inputs = {
... "insert_or_update": [
... Input(
... value="myTable", definition=db_query_update.op.inputs["table_name"],
... ),
... Input(
... value=person,
... definition=db_query_update.op.inputs["data"],
... ),
... ],
... "lookup": [
... Input(
... value="myTable",
... definition=db_query_lookup.op.inputs["table_name"],
... ),
... Input(
... value=["firstName", "lastName", "age"],
... definition=db_query_lookup.op.inputs["cols"],
... ),
... Input(value=[], definition=db_query_lookup.op.inputs["conditions"],),
... Input(
... value=[db_query_lookup.op.outputs["lookups"].name],
... definition=GetSingle.op.inputs["spec"],
... ),
... ],
... }
>>>
>>> async def main():
... async for ctx, result in MemoryOrchestrator.run(dataflow, inputs):
... if result:
... print(result)
>>>
>>> asyncio.run(main())
{'query_lookups': [{'firstName': 'John', 'lastName': 'Wick', 'age': 38}]}
>>>
>>> person["age"] += 1
>>>
>>> asyncio.run(main())
{'query_lookups': [{'firstName': 'John', 'lastName': 'Wick', 'age': 39}]}
Stage: processing
Inputs
table_name: query_table(type: str)
data: query_data(type: Dict[str, Any])
Args
database: Entrypoint
db_query_lookup
Official
Generates a lookup query in the database.
Parameters
- table_namestr
The name of the table.
- colslist[str]
Columns of the table.
- conditionsConditions
Query conditions.
Examples
>>> import asyncio
>>> from dffml import *
>>>
>>> sdb = SqliteDatabase(SqliteDatabaseConfig(filename="examples.db"))
>>>
>>> dataflow = DataFlow(
... operations={
... "db_query_lookup": db_query_lookup.op,
... "get_single": GetSingle.imp.op,
... },
... configs={"db_query_lookup": DatabaseQueryConfig(database=sdb),},
... seed=[],
... )
>>>
>>> inputs = {
... "lookup": [
... Input(
... value="myTable",
... definition=db_query_lookup.op.inputs["table_name"],
... ),
... Input(
... value=["firstName", "lastName", "age"],
... definition=db_query_lookup.op.inputs["cols"],
... ),
... Input(value=[], definition=db_query_lookup.op.inputs["conditions"],),
... Input(
... value=[db_query_lookup.op.outputs["lookups"].name],
... definition=GetSingle.op.inputs["spec"],
... ),
... ],
... }
>>>
>>> async def main():
... async for ctx, result in MemoryOrchestrator.run(dataflow, inputs):
... if result:
... print(result)
>>>
>>> asyncio.run(main())
{'query_lookups': [{'firstName': 'John', 'lastName': 'Doe', 'age': 16}, {'firstName': 'John', 'lastName': 'Wick', 'age': 39}]}
Stage: processing
Inputs
table_name: query_table(type: str)
cols: query_cols(type: Dict[str, str])
conditions: query_conditions(type: Conditions)
Outputs
lookups: query_lookups(type: Dict[str, Any])
Args
database: Entrypoint
db_query_remove
Official
Generates a remove table query in the database.
Parameters
- table_namestr
The name of the table to insert data in to.
- conditionsConditions
Query conditions.
Examples
>>> import asyncio
>>> from dffml import *
>>>
>>> sdb = SqliteDatabase(SqliteDatabaseConfig(filename="examples.db"))
>>>
>>> dataflow = DataFlow(
... operations={
... "db_query_lookup": db_query_lookup.op,
... "db_query_remove": db_query_remove.op,
... "get_single": GetSingle.imp.op,
... },
... configs={
... "db_query_remove": DatabaseQueryConfig(database=sdb),
... "db_query_lookup": DatabaseQueryConfig(database=sdb),
... },
... seed=[],
... )
>>>
>>> inputs = {
... "remove": [
... Input(
... value="myTable",
... definition=db_query_remove.op.inputs["table_name"],
... ),
... Input(value=[],
... definition=db_query_remove.op.inputs["conditions"],),
... ],
... "lookup": [
... Input(
... value="myTable",
... definition=db_query_lookup.op.inputs["table_name"],
... ),
... Input(
... value=["firstName", "lastName", "age"],
... definition=db_query_lookup.op.inputs["cols"],
... ),
... Input(value=[], definition=db_query_lookup.op.inputs["conditions"],),
... Input(
... value=[db_query_lookup.op.outputs["lookups"].name],
... definition=GetSingle.op.inputs["spec"],
... ),
... ],
... }
>>>
>>> async def main():
... async for ctx, result in MemoryOrchestrator.run(dataflow, inputs):
... if result:
... print(result)
>>>
>>> asyncio.run(main())
{'query_lookups': []}
Stage: processing
Inputs
table_name: query_table(type: str)
conditions: query_conditions(type: Conditions)
Args
database: Entrypoint
db_query_update
Official
Generates an Update table query in the database.
Parameters
- table_namestr
The name of the table to insert data in to.
- datadict
Data to be updated into the table.
- conditionslist
List of query conditions.
Examples
>>> import asyncio
>>> from dffml import *
>>>
>>> sdb = SqliteDatabase(SqliteDatabaseConfig(filename="examples.db"))
>>>
>>> dataflow = DataFlow(
... operations={
... "db_query_update": db_query_update.op,
... "db_query_lookup": db_query_lookup.op,
... "get_single": GetSingle.imp.op,
... },
... configs={
... "db_query_update": DatabaseQueryConfig(database=sdb),
... "db_query_lookup": DatabaseQueryConfig(database=sdb),
... },
... seed=[],
... )
>>>
>>> inputs = {
... "update": [
... Input(
... value="myTable",
... definition=db_query_update.op.inputs["table_name"],
... ),
... Input(
... value={
... "key": 10,
... "firstName": "John",
... "lastName": "Doe",
... "age": 17,
... },
... definition=db_query_update.op.inputs["data"],
... ),
... Input(value=[], definition=db_query_update.op.inputs["conditions"],),
... ],
... "lookup": [
... Input(
... value="myTable",
... definition=db_query_lookup.op.inputs["table_name"],
... ),
... Input(
... value=["firstName", "lastName", "age"],
... definition=db_query_lookup.op.inputs["cols"],
... ),
... Input(value=[], definition=db_query_lookup.op.inputs["conditions"],),
... Input(
... value=[db_query_lookup.op.outputs["lookups"].name],
... definition=GetSingle.op.inputs["spec"],
... ),
... ],
... }
>>>
>>> async def main():
... async for ctx, result in MemoryOrchestrator.run(dataflow, inputs):
... if result:
... print(result)
>>>
>>> asyncio.run(main())
{'query_lookups': [{'firstName': 'John', 'lastName': 'Doe', 'age': 17}]}
Stage: processing
Inputs
table_name: query_table(type: str)
data: query_data(type: Dict[str, Any])
conditions: query_conditions(type: Conditions)
Args
database: Entrypoint
dffml.dataflow.run
Official
Starts a subflow self.config.dataflow
and adds inputs
in it.
Parameters
- inputsdict
The inputs to add to the subflow. These should be a key value mapping of the context string to the inputs which should be seeded for that context string.
Returns
- dict
Maps context strings in inputs to output after running through dataflow.
Examples
The following shows how to use run dataflow in its default behavior.
>>> import asyncio
>>> from dffml import *
>>>
>>> URL = Definition(name="URL", primitive="string")
>>>
>>> subflow = DataFlow.auto(GetSingle)
>>> subflow.definitions[URL.name] = URL
>>> subflow.seed.append(
... Input(
... value=[URL.name],
... definition=GetSingle.op.inputs["spec"]
... )
... )
>>>
>>> dataflow = DataFlow.auto(run_dataflow, GetSingle)
>>> dataflow.configs[run_dataflow.op.name] = RunDataFlowConfig(subflow)
>>> dataflow.seed.append(
... Input(
... value=[run_dataflow.op.outputs["results"].name],
... definition=GetSingle.op.inputs["spec"]
... )
... )
>>>
>>> async def main():
... async for ctx, results in MemoryOrchestrator.run(dataflow, {
... "run_subflow": [
... Input(
... value={
... "dffml": [
... {
... "value": "https://github.com/intel/dffml",
... "definition": URL.name
... }
... ]
... },
... definition=run_dataflow.op.inputs["inputs"]
... )
... ]
... }):
... print(results)
>>>
>>> asyncio.run(main())
{'flow_results': {'dffml': {'URL': 'https://github.com/intel/dffml'}}}
The following shows how to use run dataflow with custom inputs and outputs. This allows you to run a subflow as if it were an operation.
>>> import asyncio
>>> from dffml import *
>>>
>>> URL = Definition(name="URL", primitive="string")
>>>
>>> @op(
... inputs={"url": URL},
... outputs={"last": Definition("last_element_in_path", primitive="string")},
... )
... def last_path(url):
... return {"last": url.split("/")[-1]}
>>>
>>> subflow = DataFlow.auto(last_path, GetSingle)
>>> subflow.seed.append(
... Input(
... value=[last_path.op.outputs["last"].name],
... definition=GetSingle.op.inputs["spec"],
... )
... )
>>>
>>> dataflow = DataFlow.auto(run_dataflow, GetSingle)
>>> dataflow.operations[run_dataflow.op.name] = run_dataflow.op._replace(
... inputs={"URL": URL},
... outputs={last_path.op.outputs["last"].name: last_path.op.outputs["last"]},
... expand=[],
... )
>>> dataflow.configs[run_dataflow.op.name] = RunDataFlowConfig(subflow)
>>> dataflow.seed.append(
... Input(
... value=[last_path.op.outputs["last"].name],
... definition=GetSingle.op.inputs["spec"],
... )
... )
>>> dataflow.update(auto_flow=True)
>>>
>>> async def main():
... async for ctx, results in MemoryOrchestrator.run(
... dataflow,
... {
... "run_subflow": [
... Input(value="https://github.com/intel/dffml", definition=URL)
... ]
... },
... ):
... print(results)
>>>
>>> asyncio.run(main())
{'last_element_in_path': 'dffml'}
Stage: processing
Inputs
inputs: flow_inputs(type: Dict[str,Any])
Outputs
results: flow_results(type: Dict[str,Any])
Args
dataflow: DataFlow
dffml.mapping.create
Official
Creates a mapping of a given key and value.
Parameters
- keystr
The key for the mapping.
- valueAny
The value for the mapping.
Returns
- dict
A dictionary containing the mapping created.
Examples
>>> import asyncio
>>> from dffml import *
>>>
>>> dataflow = DataFlow.auto(create_mapping, GetSingle)
>>> dataflow.seed.append(
... Input(
... value=[create_mapping.op.outputs["mapping"].name],
... definition=GetSingle.op.inputs["spec"],
... )
... )
>>> inputs = [
... Input(
... value="key1", definition=create_mapping.op.inputs["key"],
... ),
... Input(
... value=42, definition=create_mapping.op.inputs["value"],
... ),
... ]
>>>
>>> async def main():
... async for ctx, result in MemoryOrchestrator.run(dataflow, inputs):
... print(result)
>>>
>>> asyncio.run(main())
{'mapping': {'key1': 42}}
Stage: processing
Inputs
key: key(type: str)
value: value(type: generic)
Outputs
mapping: mapping(type: map)
dffml.mapping.extract
Official
Extracts value from a given mapping.
Parameters
- mappingdict
The mapping to extract the value from.
- traverselist[str]
A list of keys to traverse through the mapping dictionary and extract the values.
Returns
- dict
A dictionary containing the value of the keys.
Examples
>>> import asyncio
>>> from dffml import *
>>>
>>> dataflow = DataFlow.auto(mapping_extract_value, GetSingle)
>>>
>>> dataflow.seed.append(
... Input(
... value=[mapping_extract_value.op.outputs["value"].name],
... definition=GetSingle.op.inputs["spec"],
... )
... )
>>> inputs = [
... Input(
... value={"key1": {"key2": 42}},
... definition=mapping_extract_value.op.inputs["mapping"],
... ),
... Input(
... value=["key1", "key2"],
... definition=mapping_extract_value.op.inputs["traverse"],
... ),
... ]
>>>
>>> async def main():
... async for ctx, result in MemoryOrchestrator.run(dataflow, inputs):
... print(result)
>>>
>>> asyncio.run(main())
{'value': 42}
Stage: processing
Inputs
mapping: mapping(type: map)
traverse: mapping_traverse(type: List[str])
Outputs
value: value(type: generic)
dffml.model.predict
Official
Predict using dffml models.
Parameters
- featuresdict
A dictionary contaning feature name and feature value.
Returns
- dict
A dictionary containing prediction.
Examples
The following example shows how to use model_predict.
>>> import asyncio
>>> from dffml import *
>>>
>>> slr_model = SLRModel(
... features=Features(Feature("Years", int, 1)),
... predict=Feature("Salary", int, 1),
... location="tempdir",
... )
>>> dataflow = DataFlow(
... operations={
... "prediction_using_model": model_predict,
... "get_single": GetSingle,
... },
... configs={"prediction_using_model": ModelPredictConfig(model=slr_model)},
... )
>>> dataflow.seed.append(
... Input(
... value=[model_predict.op.outputs["prediction"].name],
... definition=GetSingle.op.inputs["spec"],
... )
... )
>>>
>>> async def main():
... await train(
... slr_model,
... {"Years": 0, "Salary": 10},
... {"Years": 1, "Salary": 20},
... {"Years": 2, "Salary": 30},
... {"Years": 3, "Salary": 40},
... )
... inputs = [
... Input(
... value={"Years": 4}, definition=model_predict.op.inputs["features"],
... )
... ]
... async for ctx, results in MemoryOrchestrator.run(dataflow, inputs):
... print(results)
>>>
>>> asyncio.run(main())
{'model_predictions': {'Salary': {'confidence': 1.0, 'value': 50}}}
Stage: processing
Inputs
features: record_features(type: Dict[str, Any])
Outputs
prediction: model_predictions(type: Dict[str, Any])
Args
model: Entrypoint
extract_tar_archive
Official
Extracts a given tar file.
Parameters
- input_file_pathstr
Path to the tar file
- output_directory_pathstr
Path where all the files should be extracted
Returns
- dict
Path to the directory where the archive has been extracted
Stage: processing
Inputs
input_file_path: tar_file(type: str)
output_directory_path: directory(type: str)
Outputs
output_path: output_directory_path(type: str)
extract_zip_archive
Official
Extracts a given zip file.
Parameters
- input_file_pathstr
Path to the zip file
- output_directory_pathstr
Path where all the files should be extracted
Returns
- dict
Path to the directory where the archive has been extracted
Stage: processing
Inputs
input_file_path: zip_file(type: str)
output_directory_path: directory(type: str)
Outputs
output_path: output_directory_path(type: str)
get_multi
Official
Output operation to get all Inputs matching given definitions.
Parameters
- speclist
List of definition names. Any Inputs with matching definition will be returned.
Returns
- dict
Maps definition names to all the Inputs of that definition
Examples
The following shows how to grab all Inputs with the URL definition. If we had we run an operation which output a URL, that output URL would have also been returned to us.
>>> import asyncio
>>> from dffml import *
>>>
>>> URL = Definition(name="URL", primitive="string")
>>>
>>> dataflow = DataFlow.auto(GetMulti)
>>> dataflow.seed.append(
... Input(
... value=[URL.name],
... definition=GetMulti.op.inputs["spec"]
... )
... )
>>>
>>> async def main():
... async for ctx, results in MemoryOrchestrator.run(dataflow, [
... Input(
... value="https://github.com/intel/dffml",
... definition=URL
... ),
... Input(
... value="https://github.com/intel/cve-bin-tool",
... definition=URL
... )
... ]):
... print(results)
...
>>> asyncio.run(main())
{'URL': ['https://github.com/intel/dffml', 'https://github.com/intel/cve-bin-tool']}
Stage: output
Inputs
spec: get_multi_spec(type: array)
Outputs
output: get_multi_output(type: map)
get_single
Official
Output operation to get a single Input for each definition given.
Parameters
- speclist
List of definition names. An Input with matching definition will be returned.
Returns
- dict
Maps definition names to an Input of that definition
Examples
The following shows how to grab an Inputs with the URL definition. If we had we run an operation which output a URL, that output URL could have also been returned to us.
>>> import asyncio
>>> from dffml import *
>>>
>>> URL = Definition(name="URL", primitive="string")
>>> ORG = Definition(name="ORG", primitive="string")
>>>
>>> dataflow = DataFlow.auto(GetSingle)
>>> dataflow.seed.append(
... Input(
... value=[{"Repo Link": URL.name}, ORG.name],
... definition=GetSingle.op.inputs["spec"]
... )
... )
>>>
>>> async def main():
... async for ctx, results in MemoryOrchestrator.run(dataflow, [
... Input(
... value="https://github.com/intel/dffml",
... definition=URL
... ),
... Input(
... value="Intel",
... definition=ORG
... )
... ]):
... print(results)
...
>>> asyncio.run(main())
{'ORG': 'Intel', 'Repo Link': 'https://github.com/intel/dffml'}
Stage: output
Inputs
spec: get_single_spec(type: array)
Outputs
output: get_single_output(type: map)
group_by
Official
No description
Stage: output
Inputs
spec: group_by_spec(type: Dict[str, Any])
Outputs
output: group_by_output(type: Dict[str, List[Any]])
gz_compress
Official
No description
Stage: processing
Inputs
input_file_path: decompressed_gz_file_path(type: str)
output_file_path: compressed_gz_file_path(type: str)
Outputs
output_path: compressed_output_gz_file_path(type: str)
gz_decompress
Official
No description
Stage: processing
Inputs
input_file_path: compressed_gz_file_path(type: str)
output_file_path: decompressed_gz_file_path(type: str)
Outputs
output_path: decompressed_output_gz_file_path(type: str)
literal_eval
Official
Evaluate the input using ast.literal_eval()
Parameters
- str_to_evalstr
A string to be evaluated.
Returns
- dict
A dict containing python literal.
Examples
The following example shows how to use literal_eval.
>>> import asyncio
>>> from dffml import *
>>>
>>> dataflow = DataFlow.auto(literal_eval, GetSingle)
>>> dataflow.seed.append(
... Input(
... value=[literal_eval.op.outputs["str_after_eval"].name,],
... definition=GetSingle.op.inputs["spec"],
... )
... )
>>> inputs = [
... Input(
... value="[1,2,3]",
... definition=literal_eval.op.inputs["str_to_eval"],
... parents=None,
... )
... ]
>>>
>>> async def main():
... async for ctx, results in MemoryOrchestrator.run(dataflow, inputs):
... print(results)
>>>
>>> asyncio.run(main())
{'EvaluatedStr': [1, 2, 3]}
Stage: processing
Inputs
str_to_eval: InputStr(type: str)
Outputs
str_after_eval: EvaluatedStr(type: generic)
make_tar_archive
Official
Creates tar file of a directory.
Parameters
- input_directory_pathstr
Path to directory to be archived as a tarfile.
- output_file_pathstr
Path where the output archive should be saved (should include file name)
Returns
- dict
Path to the created tar file.
Stage: processing
Inputs
input_directory_path: directory(type: str)
output_file_path: tar_file(type: str)
Outputs
output_path: output_tarfile_path(type: str)
make_zip_archive
Official
Creates zip file of a directory.
Parameters
- input_directory_pathstr
Path to directory to be archived
- output_file_pathstr
Path where the output archive should be saved (should include file name)
Returns
- dict
Path to the output zip file
Stage: processing
Inputs
input_directory_path: directory(type: str)
output_file_path: zip_file(type: str)
Outputs
output_path: output_zipfile_path(type: str)
multiply
Official
Multiply record values
Parameters
- multiplicandgeneric
An arithmetic type value.
- multipliergeneric
An arithmetic type value.
Returns
- dict
A dict containing the product.
Examples
The following example shows how to use multiply.
>>> import asyncio
>>> from dffml import *
>>>
>>> dataflow = DataFlow.auto(multiply, GetSingle)
>>> dataflow.seed.append(
... Input(
... value=[multiply.op.outputs["product"].name,],
... definition=GetSingle.op.inputs["spec"],
... )
... )
>>> inputs = [
... Input(
... value=12,
... definition=multiply.op.inputs["multiplicand"],
... ),
... Input(
... value=3,
... definition=multiply.op.inputs["multiplier"],
... ),
... ]
>>>
>>> async def main():
... async for ctx, results in MemoryOrchestrator.run(dataflow, inputs):
... print(results)
>>>
>>> asyncio.run(main())
{'product': 36}
Stage: processing
Inputs
multiplicand: multiplicand_def(type: generic)
multiplier: multiplier_def(type: generic)
Outputs
product: product(type: generic)
print_output
Official
Print the output on stdout using python print()
Parameters
- dataAny
A python literal to be printed.
Examples
The following example shows how to use print_output.
>>> import asyncio
>>> from dffml import *
>>>
>>> dataflow = DataFlow.auto(print_output)
>>> inputs = [
... Input(
... value="print_output example", definition=print_output.op.inputs["data"]
... )
... ]
>>>
>>> async def main():
... async for ctx, results in MemoryOrchestrator.run(dataflow, inputs):
... pass
>>>
>>> asyncio.run(main())
print_output example
Stage: processing
Inputs
data: DataToPrint(type: generic)
xz_compress
Official
No description
Stage: processing
Inputs
input_file_path: decompressed_xz_file_path(type: str)
output_file_path: compressed_xz_file_path(type: str)
Outputs
output_path: compressed_output_xz_file_path(type: str)
xz_decompress
Official
No description
Stage: processing
Inputs
input_file_path: compressed_xz_file_path(type: str)
output_file_path: decompressed_xz_file_path(type: str)
Outputs
output_path: decompressed_output_xz_file_path(type: str)
dffml_feature_git
pip install dffml-feature-git
check_if_valid_git_repository_URL
Official
No description
Stage: processing
Inputs
URL: URL(type: string)
Outputs
valid: valid_git_repository_URL(type: boolean)
cleanup_git_repo
Official
No description
Stage: cleanup
Inputs
repo: git_repository(type: Dict[str, str])
directory: str
URL: str(default: None)
clone_git_repo
Official
No description
Stage: processing
Inputs
URL: URL(type: string)
Outputs
repo: git_repository(type: Dict[str, str])
directory: str
URL: str(default: None)
Conditions
valid_git_repository_URL: boolean
count_authors
Official
No description
Stage: processing
Inputs
author_lines: author_line_count(type: Dict[str, int])
Outputs
authors: author_count(type: int)
git_commits
Official
No description
Stage: processing
Inputs
repo: git_repository(type: Dict[str, str])
directory: str
URL: str(default: None)
branch: git_branch(type: str)
start_end: date_pair(type: List[date])
Outputs
commits: commit_count(type: int)
git_repo_author_lines_for_dates
Official
No description
Stage: processing
Inputs
repo: git_repository(type: Dict[str, str])
directory: str
URL: str(default: None)
branch: git_branch(type: str)
start_end: date_pair(type: List[date])
Outputs
author_lines: author_line_count(type: Dict[str, int])
git_repo_checkout
Official
No description
Stage: processing
Inputs
repo: git_repository(type: Dict[str, str])
directory: str
URL: str(default: None)
commit: git_commit(type: string)
Outputs
repo: git_repository_checked_out(type: Dict[str, str])
directory: str
URL: str(default: None)
commit: str(default: None)
git_repo_commit_from_date
Official
No description
Stage: processing
Inputs
repo: git_repository(type: Dict[str, str])
directory: str
URL: str(default: None)
branch: git_branch(type: str)
date: date(type: string)
Outputs
commit: git_commit(type: string)
git_repo_default_branch
Official
No description
Stage: processing
Inputs
repo: git_repository(type: Dict[str, str])
directory: str
URL: str(default: None)
Outputs
branch: git_branch(type: str)
Conditions
no_git_branch_given: boolean
git_repo_release
Official
Was there a release within this date range
Stage: processing
Inputs
repo: git_repository(type: Dict[str, str])
directory: str
URL: str(default: None)
branch: git_branch(type: str)
start_end: date_pair(type: List[date])
Outputs
present: release_within_period(type: bool)
lines_of_code_by_language
Official
This operation relys on tokei
. Here’s how to install version 10.1.1,
check it’s releases page to make sure you’re installing the latest version.
On Linux
$ curl -sSL 'https://github.com/XAMPPRocky/tokei/releases/download/v10.1.1/tokei-v10.1.1-x86_64-apple-darwin.tar.gz' \
| tar -xvz && \
echo '22699e16e71f07ff805805d26ee86ecb9b1052d7879350f7eb9ed87beb0e6b84fbb512963d01b75cec8e80532e4ea29a tokei' | sha384sum -c - && \
sudo mv tokei /usr/local/bin/
On OSX
$ curl -sSL 'https://github.com/XAMPPRocky/tokei/releases/download/v10.1.1/tokei-v10.1.1-x86_64-apple-darwin.tar.gz' \
| tar -xvz && \
echo '8c8a1d8d8dd4d8bef93dabf5d2f6e27023777f8553393e269765d7ece85e68837cba4374a2615d83f071dfae22ba40e2 tokei' | sha384sum -c - && \
sudo mv tokei /usr/local/bin/
Stage: processing
Inputs
repo: git_repository_checked_out(type: Dict[str, str])
directory: str
URL: str(default: None)
commit: str(default: None)
Outputs
lines_by_language: lines_by_language_count(type: Dict[str, Dict[str, int]])
lines_of_code_to_comments
Official
No description
Stage: processing
Inputs
langs: lines_by_language_count(type: Dict[str, Dict[str, int]])
Outputs
code_to_comment_ratio: language_to_comment_ratio(type: int)
make_quarters
Official
No description
Stage: processing
Inputs
number: quarters(type: int)
Outputs
quarters: quarter(type: int)
quarters_back_to_date
Official
No description
Stage: processing
Inputs
date: quarter_start_date(type: int)
number: quarter(type: int)
Outputs
date: date(type: string)
start_end: date_pair(type: List[date])
work
Official
No description
Stage: processing
Inputs
author_lines: author_line_count(type: Dict[str, int])
Outputs
work: work_spread(type: int)
shouldi
pip install shouldi
cleanup_pypi_package
Official
Remove the directory containing the source code release.
Stage: cleanup
Inputs
directory: run_bandit.inputs.pkg(type: str)
pypi_package_contents
Official
Download a source code release and extract it to a temporary directory.
Stage: processing
Inputs
url: pypi_package_contents.inputs.url(type: str)
Outputs
directory: run_bandit.inputs.pkg(type: str)
pypi_package_json
Official
Download the information on the package in JSON format.
Stage: processing
Inputs
package: safety_check.inputs.package(type: str)
Outputs
version: safety_check.inputs.version(type: str)
url: pypi_package_contents.inputs.url(type: str)
run_bandit
Official
CLI usage: dffml service dev run -log debug shouldi.bandit:run_bandit -pkg .
Stage: processing
Inputs
pkg: run_bandit.inputs.pkg(type: str)
Outputs
result: run_bandit.outputs.result(type: map)
safety_check
Official
No description
Stage: processing
Inputs
package: safety_check.inputs.package(type: str)
version: safety_check.inputs.version(type: str)
Outputs
result: safety_check.outputs.result(type: int)
dffml_operations_nlp
pip install dffml-operations-nlp
collect_output
Official
No description
Stage: processing
Inputs
sentence: sentence(type: string)
length: source_length(type: string)
Outputs
all: all_sentences(type: List[string])
count_vectorizer
Official
Converts a collection of text documents to a matrix of token counts using sklearn CountVectorizer’s fit_transform method. For details on parameters check https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html Parameters specific to this operation are described below.
Parameters
- textlist
A list of strings.
- get_feature_names: bool
If True return feature names using get_feature_names method of CountVectorizer.
Returns
- result: list
A list containing token counts and feature names if get_feature_names is True.
Stage: processing
Inputs
text: count_vectorizer.inputs.text(type: array)
encoding: count_vectorizer.inputs.encoding(type: str)
decode_error: count_vectorizer.inputs.decode_error(type: str)
strip_accents: count_vectorizer.inputs.strip_accents(type: str)
lowercase: count_vectorizer.inputs.lowercase(type: bool)
stop_words: count_vectorizer.inputs.stop_words(type: str)
token_pattern: count_vectorizer.inputs.token_pattern(type: str)
ngram_range: count_vectorizer.inputs.ngram_range(type: array)
analyzer: count_vectorizer.inputs.analyzer(type: str)
max_df: count_vectorizer.inputs.max_df(type: float)
min_df: count_vectorizer.inputs.min_df(type: float)
max_features: count_vectorizer.inputs.max_features(type: int)
vocabulary: count_vectorizer.inputs.vocabulary(type: map)
binary: count_vectorizer.inputs.binary(type: bool)
get_feature_names: count_vectorizer.inputs.get_feature_names(type: bool)
Outputs
result: count_vectorizer.outputs.result(type: array)
extract_array_from_matrix
Official
Returns row from input_matrix based on index of single_text_example in collected_text.
Parameters
- single_text_examplestr
String to be used for indexing into collected_text.
- collected_text: list
List of strings.
- input_matrix: list
A 2-D matrix where each row represents vector corresponding to single_text_example.
Returns
result: A 1-d array.
Stage: processing
Inputs
single_text_example: extract_array_from_matrix.inputs.single_text_example(type: str)
collected_text: extract_array_from_matrix.inputs.collected_text(type: array)
input_matrix: extract_array_from_matrix.inputs.input_matrix(type: array)
Outputs
result: extract_array_from_matrix.outputs.result(type: array)
get_embedding
Official
Maps words of text data to their corresponding word vectors.
Parameters
- textstr
String to be converted to word vectors.
- max_len: int
Maximum length of sentence. If the length of text > max_len, text is truncated to have length = max_len. If the length of text < max_len, text is padded with pad_token such that len(text) = max_len.
- pad_token: str
Token to be used for padding text if len(text) < max_len
- spacy_model: str
Spacy model to be used for assigning vectors to tokens.
Returns
result: A 2-d array of shape (max_len, embedding_size of vectors).
Stage: processing
Inputs
text: text_def(type: str)
spacy_model: spacy_model_name_def(type: str)
max_len: max_len_def(type: int)
pad_token: pad_token_def(type: str)
Outputs
embedding: embedding(type: generic)
get_noun_chunks
Official
Extracts the noun chunks from text.
Parameters
- textstr
String to extract noun chunks from.
- spacy_model: str
A spacy model with the capability of parsing.
Returns
- result: list
A list containing noun chunks.
Stage: processing
Inputs
text: get_noun_chunks.inputs.text(type: str)
spacy_model: get_noun_chunks.inputs.spacy_model(type: str)
Outputs
result: get_noun_chunks.outputs.result(type: array)
get_sentences
Official
Extracts the sentences from text.
Parameters
- textstr
String to extract sentences from.
- spacy_model: str
A spacy model with the capability of parsing. Sentence boundaries are calculated from the syntactic dependency parse.
Returns
- result: list
A list containing sentences.
Stage: processing
Inputs
text: get_sentences.inputs.text(type: str)
spacy_model: get_sentences.inputs.spacy_model(type: str)
Outputs
result: get_sentences.outputs.result(type: array)
get_similarity
Official
Calculates similarity between two text strings as a score between 0 and 1.
Parameters
- text_1str
First string to compare.
- text_2str
Second string to compare.
- spacy_model: str
Spacy model to be used for extracting word vectors which are used for calculating similarity.
Returns
- result: float
A similarity score between 0 and 1.
Stage: processing
Inputs
text_1: get_similarity.inputs.text_1(type: str)
text_2: get_similarity.inputs.text_2(type: str)
spacy_model: get_similarity.inputs.spacy_model(type: str)
Outputs
result: get_similarity.outputs.result(type: float)
lemmatizer
Official
Reduce words in the text to their dictionary form (lemma)
Parameters
- textstr
String to lemmatize.
- spacy_model: str
Spacy model to be used for lemmatization.
Returns
- result: list
A list containing base form of the words.
Stage: processing
Inputs
text: lemmatizer.inputs.text(type: str)
spacy_model: lemmatizer.inputs.spacy_model(type: str)
Outputs
result: lemmatizer.outputs.result(type: array)
pos_tagger
Official
Assigns part-of-speech tags to text.
Parameters
- textstr
Text to be tagged.
- spacy_model: str
A spacy model with tagger and parser.
Returns
- result: list
A list containing tuples of word and their respective pos tag.
Stage: processing
Inputs
text: pos_tagger.inputs.text(type: str)
spacy_model: pos_tagger.inputs.spacy_model(type: str)
tag_type: pos_tagger.inputs.tag_type(type: str)
Outputs
result: pos_tagger.outputs.result(type: array)
remove_stopwords
Official
Removes stopword from text data.
Parameters
- textstr
String to be cleaned.
- custom_stop_words: List[str], default = None
List of words to be considered as stop words.
Returns
result: A string without stop words.
Stage: processing
Inputs
text: remove_stopwords.inputs.text(type: str)
custom_stop_words: remove_stopwords.inputs.custom_stop_words(type: array)
Outputs
result: remove_stopwords.outputs.result(type: str)
tfidf_vectorizer
Official
Convert a collection of raw documents to a matrix of TF-IDF features using sklearn TfidfVectorizer’s fit_transform method. For details on parameters check https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html Parameters specific to this operation are described below.
Parameters
- textlist
A list of strings.
- get_feature_names: bool
If True return feature names using get_feature_names method of TfidfVectorizer.
Returns
- result: list
A list containing token counts and feature names if get_feature_names is True.
Stage: processing
Inputs
text: tfidf_vectorizer.inputs.text(type: array)
encoding: tfidf_vectorizer.inputs.encoding(type: str)
decode_error: tfidf_vectorizer.inputs.decode_error(type: str)
strip_accents: tfidf_vectorizer.inputs.strip_accents(type: str)
lowercase: tfidf_vectorizer.inputs.lowercase(type: bool)
analyzer: tfidf_vectorizer.inputs.analyzer(type: str)
stop_words: tfidf_vectorizer.inputs.stop_words(type: str)
token_pattern: tfidf_vectorizer.inputs.token_pattern(type: str)
ngram_range: tfidf_vectorizer.inputs.ngram_range(type: array)
max_df: tfidf_vectorizer.inputs.max_df(type: str)
min_df: tfidf_vectorizer.inputs.min_df(type: str)
max_features: tfidf_vectorizer.inputs.max_features(type: str)
vocabulary: tfidf_vectorizer.inputs.vocabulary(type: str)
binary: tfidf_vectorizer.inputs.binary(type: bool)
norm: tfidf_vectorizer.inputs.norm(type: str)
use_idf: tfidf_vectorizer.inputs.use_idf(type: bool)
smooth_idf: tfidf_vectorizer.inputs.smooth_idf(type: bool)
sublinear_tf: tfidf_vectorizer.inputs.sublinear_tf(type: bool)
get_feature_names: tfidf_vectorizer.inputs.get_feature_names(type: bool)
Outputs
result: tfidf_vectorizer.outputs.result(type: array)
dffml_operations_image
pip install dffml-operations-image
Haralick
Official
Computes Haralick texture features
Stage: processing
Inputs
f: Haralick.inputs.f(type: array)
ignore_zeros: Haralick.inputs.ignore_zeros(type: bool)
preserve_haralick_bug: Haralick.inputs.preserve_haralick_bug(type: bool)
compute_14th_feature: Haralick.inputs.compute_14th_feature(type: bool)
return_mean: Haralick.inputs.return_mean(type: bool)
return_mean_ptp: Haralick.inputs.return_mean_ptp(type: bool)
use_x_minus_y_variance: Haralick.inputs.use_x_minus_y_variance(type: bool)
distance: Haralick.inputs.distance(type: int)
Outputs
result: Haralick.outputs.result(type: array)
HuMoments
Official
Calculates seven Hu invariants
Stage: processing
Inputs
m: HuMoments.inputs.m(type: array)
Outputs
result: HuMoments.outputs.result(type: array)
calcHist
Official
Calculates a histogram
Stage: processing
Inputs
images: calcHist.inputs.images(type: array)
channels: calcHist.inputs.channels(type: array)
mask: calcHist.inputs.mask(type: array)
histSize: calcHist.inputs.histSize(type: array)
ranges: calcHist.inputs.ranges(type: array)
Outputs
result: calcHist.outputs.result(type: array)
convert_color
Official
Converts images from one color space to another
Stage: processing
Inputs
src: convert_color.inputs.src(type: array)
code: convert_color.inputs.code(type: str)
Outputs
result: convert_color.outputs.result(type: array)
flatten
Official
No description
Stage: processing
Inputs
array: flatten.inputs.array(type: array)
Outputs
result: flatten.outputs.result(type: array)
normalize
Official
Normalizes arrays
Stage: processing
Inputs
src: normalize.inputs.src(type: array)
alpha: normalize.inputs.alpha(type: int)
beta: normalize.inputs.beta(type: int)
norm_type: normalize.inputs.norm_type(type: int)
dtype: normalize.inputs.dtype(type: int)
mask: normalize.inputs.mask(type: array)
Outputs
result: normalize.outputs.result(type: array)
resize
Official
Resizes image array to the specified new dimensions
If the new dimensions are in 2D, the image is converted to grayscale.
- To enlarge the image (src dimensions < dsize),
it will resize the image with INTER_CUBIC interpolation.
- To shrink the image (src dimensions > dsize),
it will resize the image with INTER_AREA interpolation
Stage: processing
Inputs
src: resize.inputs.src(type: array)
dsize: resize.inputs.dsize(type: array)
fx: resize.inputs.fx(type: float)
fy: resize.inputs.fy(type: float)
interpolation: resize.inputs.interpolation(type: int)
Outputs
result: resize.outputs.result(type: array)
dffml_operations_deploy
pip install dffml-operations-deploy
check_if_default_branch
Official
No description
Stage: processing
Inputs
payload: git_payload(type: Dict[str,Any])
Outputs
is_default_branch: is_default_branch(type: bool)
check_secret_match
Official
No description
Stage: processing
Inputs
headers: webhook_headers(type: Dict[str,Any])
body: payload(type: bytes)
Outputs
git_payload: git_payload(type: Dict[str,Any])
Args
secret: Entrypoint
docker_build_image
Official
No description
Stage: processing
Inputs
docker_commands: docker_commands(type: Dict[str,Any])
Outputs
build_status: is_image_built(type: bool)
Conditions
is_default_branch: bool
got_running_containers: bool
get_image_tag
Official
No description
Stage: processing
Inputs
payload: git_payload(type: Dict[str,Any])
Outputs
image_tag: docker_image_tag(type: str)
get_running_containers
Official
No description
Stage: processing
Inputs
tag: docker_image_tag(type: str)
Outputs
running_containers: docker_running_containers(type: List[str])
get_status_running_containers
Official
No description
Stage: processing
Inputs
containers: docker_running_containers(type: List[str])
Outputs
status: got_running_containers(type: bool)
get_url_from_payload
Official
No description
Stage: processing
Inputs
payload: git_payload(type: Dict[str,Any])
Outputs
url: URL(type: string)
parse_docker_commands
Official
No description
Stage: processing
Inputs
repo: git_repository(type: Dict[str, str])
directory: str
URL: str(default: None)
image_tag: docker_image_tag(type: str)
Outputs
docker_commands: docker_commands(type: Dict[str,Any])
restart_running_containers
Official
No description
Stage: processing
Inputs
docker_commands: docker_commands(type: Dict[str,Any])
containers: docker_running_containers(type: List[str])
Outputs
containers: docker_restarted_containers(type: str)
Conditions
is_image_built: bool
dffml_operations_data
pip install dffml-operations-data
principal_component_analysis
Official
Decomposes the data into (n_samples, n_components) using PCA method
Parameters
- dataList[List[int]]
data to be decomposed.
- n_componentsint
number of colums the data should have after decomposition.
Returns
result: Data having dimensions (n_samples, n_components)
Stage: processing
Inputs
data: input_data(type: List[List[int]])
n_components: n_components(type: int)
Outputs
result: output_data(type: List[List[int]])
singular_value_decomposition
Official
Decomposes the data into (n_samples, n_components) using SVD method.
Parameters
- dataList[List[int]]
data to be decomposed.
- n_componentsint
number of colums the data should have after decomposition.
Returns
result: Data having dimensions (n_samples, n_components)
Stage: processing
Inputs
data: input_data(type: List[List[int]])
n_components: n_components(type: int)
n_iter: n_iter(type: int)
random_state: random_state(type: int)
Outputs
result: output_data(type: List[List[int]])
simple_imputer
Official
Imputation method for missing values
Parameters
- dataList[List[int]]
data in which missing values are present
- missing_valuesAny str, int, float, None default = np.nan
The value present in place of missing value
- strategystr “mean”, “median”, “constant”, “most_frequent” default = “mean”
The value present in place of missing value
Returns
result: Dataset having missing values imputed with the strategy
Stage: processing
Inputs
data: input_data(type: List[List[int]])
missing_values: missing_values(type: Any)
strategy: strategy(type: str)
Outputs
result: output_data(type: List[List[int]])
one_hot_encoder
Official
One hot encoding for categorical data columns
Parameters
- dataList[List[int]]
data to be encoded.
- categoriesList[List[str]]
Categorical values which needs to be encoded
Returns
result: Encoded data for categorical values
Stage: processing
Inputs
data: input_data(type: List[List[int]])
categories: categories(type: List[List[Any]])
Outputs
result: output_data(type: List[List[int]])
standard_scaler
Official
Standardize features by removing the mean and scaling to unit variance.
Parameters
- data: List[List[int]]
data that needs to be standardized
Returns
result: Standardized data
Stage: processing
Inputs
data: input_data(type: List[List[int]])
Outputs
result: output_data(type: List[List[int]])
remove_whitespaces
Official
Removes white-spaces from the dataset
Parameters
- dataList[List[int]]
dataset.
Returns
result: dataset having whitespaces removed
Stage: processing
Inputs
data: input_data(type: List[List[int]])
Outputs
result: output_data(type: List[List[int]])
ordinal_encoder
Official
One hot encoding for categorical data columns
Parameters
- dataList[List[int]]
data to be encoded.
- categoriesList[List[str]]
Categorical values which needs to be encoded
Returns
result: Encoded data for categorical values
Stage: processing
Inputs
data: input_data(type: List[List[int]])
Outputs
result: output_data(type: List[List[int]])
dffml_operations_binsec
pip install dffml-operations-binsec
files_in_rpm
Official
No description
Stage: processing
Inputs
rpm: RPMObject(type: python_obj)
Outputs
files: rpm_filename(type: str)
is_binary_pie
Official
No description
Stage: processing
Inputs
rpm: RPMObject(type: python_obj)
filename: rpm_filename(type: str)
Outputs
is_pie: binary_is_PIE(type: bool)
url_to_urlbytes
Official
No description
Stage: processing
Inputs
URL: URL(type: string)
Outputs
download: URLBytes(type: python_obj)
urlbytes_to_rpmfile
Official
No description
Stage: processing
Inputs
download: URLBytes(type: python_obj)
Outputs
rpm: RPMObject(type: python_obj)
urlbytes_to_tarfile
Official
No description
Stage: processing
Inputs
download: URLBytes(type: python_obj)
Outputs
rpm: RPMObject(type: python_obj)