Data Manipulation¤
There are several philosophies in transforming the dataset. For example, some methods will transform the data in a column based style while some other methods will perform the transformation in a row based style. HaferML implemented utilities for both of these styles.
Transformer¤
haferml.etl.transform.pipeline.Transformer
is a row-based transformer that transforms one row into the data we need. There are several advantages of this type of transformation,
- we can simply drop the record if it can not be used, and
- we can easily stream the data as it comes in.
Ordered Transformer¤
HaferML does not have a dedicated transformer for column-based transformations. The reason is that we can easily do this using haferml.preprocess.ingredients.OrderedProcessor
or build our own using haferml.preprocess.ingredients.with_transforms
and haferml.preprocess.ingredients.order
or haferml.preprocess.ingredients.attributes
with each step being some column based operations.
With pandas, column based transformations are much easier than the previous row-based method.