DataFlow – Data Input/Output Controller¶
-
class
relogic.logickit.dataflow.
DataFlow
(config, tokenizer)[source]¶ DataFlow controls the data process and batch generation.
The DataFlow adopts examples from Structure or json object.
Note: Most current implementation is based on BERT model.
- Parameters
config (SimpleNamespace) – Configuration for the DataFlow class.
tokenizer – Tokenizer for string tokenization.
-
abstract
convert_examples_to_features
(examples)[source]¶ Basic method abstraction for converting examples to features.
-
abstract property
example_class
¶ Return the Example class based on the Subclass.
-
get_minibatches
(minibatch_size, sequential=False)[source]¶ Generate list of batch size based on examples.
There are two modes for generating batches. One is sequential, which follows the original example sequence in the dataset. The other mode is based on bucketing, to save the memory consumption.
-
property
minibatch_class
¶ Return the MiniBatch class based on the Subclass.
-
abstract
process_example
(example)[source]¶ Basic method for example processing. This method needs be implemented case by case. For different Subclass, it has different arguments during the example processing.
-
property
size
¶ The size of the dataset.
-
update_with_file
(file_name)[source]¶ Read json objects from file.
- Parameters
file_name (str) – Filename.