Ve: A linguistic framework                         you can use.

veː

You have a flexible web framework, a powerful database, a kickass ORM, a beautiful API and insightful metrics. So why are you still using inaccurate stemming and simplistic part of speech tagging that only works for English?

Ve is a linguistic framework built for programmers. Use advanced language parsing in your app without first taking years of grammar, math and machine intelligence or reading through obtuse parser manuals. ZOMG!

Ve sits on top of existing, powerful language parsers and unifies their output into a common interface.

Ve is in alpha stage. It might eat your kids.

A hosted API will be available for an easy out-of-the-box experience. For a small sum.

The source code and technical details are available at github.com/kimtaro/ve.

Try Ve

Ruby Example

Javascript Example

Ve gives you

Base form of words
flying → fly
flies → fly
Look 'ma! No more inaccurate stemming!
Sentences
Last year I went to the U.S.A. Next year I will go to S. Korea. → ['Last year I went to the U.S.A.', 'Next year I will go to S. Korea.']
Parts of speech
I eat gigantic cookies → Pronoun Verb Adjective Noun/Plural
Transliterations
漢字 → kanji
Many languages use non-latin writing, or extensions to the a-z in English. Being able to convert between scripts is sometimes crucial to an application. Ve has beautiful support for this.
Just what you need
Existing language parsers are aimed at researchers and linguists and give far more detailed information than most everyday applications call for. Ve transforms complex data into just what you need.

Why Ve is good

Language agnostic
Use the same API for all languages that Ve supports.
Programming language agnostic
Ve aims to be a specification of features and compontents, rather than the implementation itself. You can write a Ve server in Ruby, Java, Perl or whatever floats your boat and have it drive Ve clients in any language.
Non-destructive parsing
Most parsers spit out only the text they deal with, ignoring whitespace, punctuation and text it can't process. Ve goes to great lengths to make sure your original input can be reconstructed from its output.
Protection
Some parsers have STDs instabilities and odd interfaces. Ve takes care of weird character encodings and crashing parsers, giving you a stable and predictable interface.
Access
Know linguistics? Ve gives you access to the raw output from the underlying parser in case you feel daring enough to dive in.

Support chart

Ve aims to support all of its features for as many languages as possible, but it's a lot of work. Here's the currently supported functions per language.

  Words with base forms Parts of speech Sentences Transliterations
English
Japanese
  • Hiragana and Katakana to Latin
  • Hiragana to Katakana
  • Katakana to Hiragana
  • Latin to Hiragana
  • Kanji to Hiragana

The parsers

Underneath all the Ve functionality lies powerful parsers developed by Real Linguistis(tm) and this framework would not be possible if it wasn't for their existence.

The English functionality is provided by FreeLing, a multi-language parser developed at X University in Spain.

For Japanese the functionality comes from MeCab, developed by Taku Kudo.

Caveats

Ve is meant for pragmatic applications of linguistics, like search indexing and basic part of speech tagging. It's not meant for serious linguistic research. Accuracy is dependent on each individual parser and the amount of simplification that has to be done to match the Ve specification. Don't use Ve if you need 100% accurate parsing, use humans instead.