A first demo to show the basic principles of machine translation evaluation, and how online interfaces can facilitate the process. This is a very early version (the project started in July, developed by Bram Vanroy on 50% employment). The final project aims to
- Create an accessible website to evaluate machine translation for both experts and non-experts
- Create a Python package that focuses on machine translation that incorporates both baseline and state-of-the-art machine translation evaluation metrics
- Include research-focused functionality such as custom metric options, batch processing, exporting the results to formats such as LaTeX and Excel, and visualizing results in meaningful graphs
- Integrate the website in the CLARIN infrastructure, specifically the CLARIN B centre of INT
- Open-source everything
The current demo is a very first (alpha) indication of the direction that we are taking. Improvements that will follow:
- More metrics, both baseline metrics as newer ones. We will especially focus on metrics introduced in the last two years of the WMT Metric shared task
- Expanding beyond the sentence level. For a first demo, it makes sense to show quick sentence-level evaluations but to make the tool useful for research, it will also allow for corpus-level evaluation
- Python package and documentation will be made public, as well as the final project's source code
- More visualizations options on the website (e.g., visualizing edit distance)
- Integration in CLARIN infrastructure
Is there anything else you would like to see included? Get in touch!