Features

Here are some examples to get you started.

src.features.build_features.feature_extraction(dataset, stopwords)

Main function to do all feature engineering

src.features.build_features.get_fasttext()

Load fasttext french pretrained model

https://fasttext.cc/docs/en/pretrained-vectors.html

src.features.build_features.get_vec(text, model, stopwords)

Transform text pandas series in array with the vector representation of the sentence using fasttext model

src.features.build_features.replace_na(dataset, labels)

Fill NaN with ‘na’

src.features.build_features.sent2vec(s, model, stopwords)

Transform a sentence into a vector using fasttext representation

src.features.build_features.stack_sparse(components)

Stack sparse vectors horizontally [X_1, X_2, ..]

src.features.build_features.to_categorical(dataset, label)

Transform variable to categorical using one hot encoding

src.features.build_features.to_sparse_int(dataset, label)

Transform to intiger encoding and in sparse from

src.features.build_features.to_tfidf(dataset, label, stopwords)

Term frequency–inverse document frequency reflect how important a word is to a document in a collection or corpus

Parameters:ngram_range – tuple containing the range of ngram sizes to use.