Zde se nacházíte:
Informace o publikaci
ScaleText
Autoři | |
---|---|
Rok publikování | 2017 |
Druh | Software |
www | Repositář projektu (neveřejný, přístup na vyžádání vůči podpisu NDA) |
Popis | ScaleText version 1.0 is a production-grade software system for large-scale scalable semantic search. The core of this result is a vector search engine, realized as a stand-alone software package that implements document indexing and search using vectors for text representation. The vectors are created automatically from plain text using several methods for semantic analysis: LSI, LDA, TF-IDF, Doc2vec a Stanford gloVe. The documents go through several stages, from preprocessing, segmentation, vectorization to vector encoding and storage. Each step is realized by a dedicated component, with its output backed by a backend database engine for persistence. Release 1.0 includes a full re-implementation of the entire pipeline at scale, in Python 3.5, including a set of top-level scripts for document indexing and a container architecture for deployment into production environments. |
Související projekty: |