Features¶
Here are some examples to get you started.
-
src.features.build_features.
feature_extraction
(dataset, stopwords)¶ Main function to do all feature engineering
-
src.features.build_features.
get_fasttext
()¶ Load fasttext french pretrained model
-
src.features.build_features.
get_vec
(text, model, stopwords)¶ Transform text pandas series in array with the vector representation of the sentence using fasttext model
-
src.features.build_features.
replace_na
(dataset, labels)¶ Fill NaN with ‘na’
-
src.features.build_features.
sent2vec
(s, model, stopwords)¶ Transform a sentence into a vector using fasttext representation
-
src.features.build_features.
stack_sparse
(components)¶ Stack sparse vectors horizontally [X_1, X_2, ..]
-
src.features.build_features.
to_categorical
(dataset, label)¶ Transform variable to categorical using one hot encoding
-
src.features.build_features.
to_sparse_int
(dataset, label)¶ Transform to intiger encoding and in sparse from
-
src.features.build_features.
to_tfidf
(dataset, label, stopwords)¶ Term frequency–inverse document frequency reflect how important a word is to a document in a collection or corpus
Parameters: ngram_range – tuple containing the range of ngram sizes to use.