Features¶

Here are some examples to get you started.

src.features.build_features.feature_extraction(dataset, stopwords)¶: Main function to do all feature engineering

src.features.build_features.get_fasttext()¶

Load fasttext french pretrained model

src.features.build_features.get_vec(text, model, stopwords)¶: Transform text pandas series in array with the vector representation of the sentence using fasttext model

src.features.build_features.replace_na(dataset, labels)¶: Fill NaN with ‘na’

src.features.build_features.sent2vec(s, model, stopwords)¶: Transform a sentence into a vector using fasttext representation

src.features.build_features.stack_sparse(components)¶: Stack sparse vectors horizontally [X_1, X_2, ..]

src.features.build_features.to_categorical(dataset, label)¶: Transform variable to categorical using one hot encoding

src.features.build_features.to_sparse_int(dataset, label)¶: Transform to intiger encoding and in sparse from

src.features.build_features.to_tfidf(dataset, label, stopwords)¶

Term frequency–inverse document frequency reflect how important a word is to a document in a collection or corpus

Parameters:	ngram_range – tuple containing the range of ngram sizes to use.