Curva Fin Bloque

Linguistic Information for Hybrid MT

Pangeanic’s development team gathered important input on real advances on MT from the academia at the International Workshop on Using Linguistic Information for Hybrid Machine Translation (LIHMT-2011) and the practical Saturday session ML4HMT (META NET WP2) in conjunction with DKFI. These sessions were really geared for development personnel and those with a personal interest in making use of the best of research on MT and its different flavors to improve current state-of-the-art systems. Attendants and presenters were academics from the US, European Union and Japan involved in different MT areas. The theme was state-of-the-art developments in combinations (often involving Moses) and hybridization of rule-based approaches with statistics. Sessions dealt with combined approaches using syntax, grammatical information, rules and statistical systems.

As different research teams are facing the same problems worldwide, some similar, other new and imaginative approaches are beginning to emerge, for example:

  • Lemmatisation, annotation for morphologically-rich languages, for example Czech and Basque and even lesser resources in the case of the 2nd one.
  • Syntax-based approaches and word re-ordering for very unrelated languages (such as Asian or Semitic languages into and out of European languages).
  • Web-based annotation tools.
  • Hybridization of techniques, starting from analysis at a morphological layer, then analytical layers, tectogrammatical layers, and then transfer, and on to synthesis to t-layers, a-layer and m-layer.
  • Word disambiguation.
  • Mixture of rule-based and statistical approaches to improve predictability.
  • Post-editing effort estimation for MT systems and with systems including no linguistic features or having some. Linguistic features are relevant for direct useful error detections and for automatic post-editing. But for sentence-level CE there are issues with sparsity and with representation.
  • New metrics like VERTa, using linguistic knowledge organized in different levels (lexical, morphological, syntactic information and sentence semantics).

Leave a Reply

Your email address will not be published.

Where we are



Pangeanic Headquarters

Av. Cortes Valencianas, 26-5,

Ofi 107

46015 Valencia (Spain)

(+34) 917 94 45 64 / (+34) 96 333 63 33


Flat8, 279 Church Road,
Crystal Palace
SE19 2QQ
United Kingdom
+44 203 5400 256


Castellana 91
Madrid 28046
(+34) 91 326 29 33



One Boston Place
Suite 2600
Boston MA 02108
(617) 621-4084

New York

228 E 45TH St Rm 9E
10017-3337 New York, NY  


Hong Kong

21st Floor, CMA Building
64 Connaught Road Central
Hong Kong
Toll Free: +852 2157 3950


Ogawa Building 3F

3-37 Kanda Sakuma-cho

Chiyoda-ku, Tokyo



Tomson Commercial Building,
Room 316-317
710 Dong Fang Road
Pu Dong, Shanghai 200122, China