DataFlow – Data Input/Output Controller

class relogic.logickit.dataflow.DataFlow(config, tokenizer)[source]

DataFlow controls the data process and batch generation.

The DataFlow adopts examples from Structure or json object.

Note: Most current implementation is based on BERT model.

Parameters
  • config (SimpleNamespace) – Configuration for the DataFlow class.

  • tokenizer – Tokenizer for string tokenization.

abstract convert_examples_to_features(examples)[source]

Basic method abstraction for converting examples to features.

endless_minibatches(minibatch_size)[source]

Generate endless minibatches with given batch size.

abstract property example_class

Return the Example class based on the Subclass.

get_minibatches(minibatch_size, sequential=False)[source]

Generate list of batch size based on examples.

There are two modes for generating batches. One is sequential, which follows the original example sequence in the dataset. The other mode is based on bucketing, to save the memory consumption.

Parameters
  • minibatch_size (int) – Batch size.

  • sequential (bool) – To be sequential mode or not.

property minibatch_class

Return the MiniBatch class based on the Subclass.

abstract process_example(example)[source]

Basic method for example processing. This method needs be implemented case by case. For different Subclass, it has different arguments during the example processing.

property size

The size of the dataset.

update_with_file(file_name)[source]

Read json objects from file.

Parameters

file_name (str) – Filename.

update_with_jsons(examples)[source]

Convert json object into Example.

This method can be used in deployment or training.

Parameters

examples – (List[Dict]): List of json objects.

update_with_structures(structures)[source]

Convert the Structure into Example.

This method is used during the deployment.

Parameters

structures (List[Structure]) – List of Structure.