You have a flexible web framework, a powerful database, a kickass ORM, a beautiful API and insightful metrics. So why are you still using inaccurate stemming and simplistic part of speech tagging that only works for English?
Ve is a linguistic framework built for programmers. Use advanced language parsing in your app without first taking years of grammar, math and machine intelligence or reading through obtuse parser manuals. ZOMG!
Ve sits on top of existing, powerful language parsers and unifies their output into a common interface.
Ve is in alpha stage. It might eat your kids.
A hosted API will be available for an easy out-of-the-box experience. For a small sum.
The source code and technical details are available at github.com/kimtaro/ve.
Ve gives you
- Base form of words
flying → fly
flies → fly
Look 'ma! No more inaccurate stemming!
- Last year I went to the U.S.A. Next year I will go to S. Korea. → ['Last year I went to the U.S.A.', 'Next year I will go to S. Korea.']
- Parts of speech
- I eat gigantic cookies → Pronoun Verb Adjective Noun/Plural
漢字 → kanji
Many languages use non-latin writing, or extensions to the a-z in English. Being able to convert between scripts is sometimes crucial to an application. Ve has beautiful support for this.
- Just what you need
- Existing language parsers are aimed at researchers and linguists and give far more detailed information than most everyday applications call for. Ve transforms complex data into just what you need.
Why Ve is good
- Language agnostic
- Use the same API for all languages that Ve supports.
- Programming language agnostic
- Ve aims to be a specification of features and compontents, rather than the implementation itself. You can write a Ve server in Ruby, Java, Perl or whatever floats your boat and have it drive Ve clients in any language.
- Non-destructive parsing
- Most parsers spit out only the text they deal with, ignoring whitespace, punctuation and text it can't process. Ve goes to great lengths to make sure your original input can be reconstructed from its output.
- Some parsers have
STDsinstabilities and odd interfaces. Ve takes care of weird character encodings and crashing parsers, giving you a stable and predictable interface.
- Know linguistics? Ve gives you access to the raw output from the underlying parser in case you feel daring enough to dive in.
Ve aims to support all of its features for as many languages as possible, but it's a lot of work. Here's the currently supported functions per language.
|Words with base forms||Parts of speech||Sentences||Transliterations|
Underneath all the Ve functionality lies powerful parsers developed by Real Linguistis(tm) and this framework would not be possible if it wasn't for their existence.
The English functionality is provided by FreeLing, a multi-language parser developed at X University in Spain.
For Japanese the functionality comes from MeCab, developed by Taku Kudo.
Ve is meant for pragmatic applications of linguistics, like search indexing and basic part of speech tagging. It's not meant for serious linguistic research. Accuracy is dependent on each individual parser and the amount of simplification that has to be done to match the Ve specification. Don't use Ve if you need 100% accurate parsing, use humans instead.