Olga Zamaraeva · f4ea0097
--- a/dataset.md
+++ b/dataset.md
+The required data format is [Xigt](https://github.com/xigt/xigt/wiki), with morpheme-level segmentation,
+morpheme-to-gloss alignment, and POS tags.
+
+A sample (toy) dataset is included under data/.
+
+There is a fairly reliable toolbox-to-xigt converter.
+
+Make sure your IGT file is properly encoded as UTF-8,
+each igt in your collection has a unique \ref value and then run the converter:
+
+```
+$ xigt import -f toolbox -i your_toolbox_igt.txt -o xigtified_igt.xml
+
+```
+(The above assumes you've installed xigt via pip; 
+this will happen on its own if you used pip to install MOM).
+
+One way to replace \ref values in your files with the line numbers (so that each is unique) is
+using the awk tool, available on Mac OSX and generally on Unix:
+
+```
+$ awk '/^\\ref/{print "\\ref igt" NR;};!/^\\ref/' < original_file.txt > modified_file.txt
+```
+
+FLEx-to-xigt converter will be created in the future.
+
+If your dataset does not have morpheme-level segmentation or POS tags, you can use [INTENT][1] to enrich it.
+
+You can use [this online interface](http://intent.xigt.org/) of INTENT.
+
+If you want to install INTENT and run it from your machine, the command you want will look something like this:
+
+```
+python3 intent.py enrich xigt_igts.xml igts-enriched.xml --align heur --pos class
+```
+
+If you experience trouble trying to set up INTENT or running it, please contact me at olga.zamaraeva@ gee mail and I will try to run it for you on your data.
+
+
+[1]:https://github.com/rgeorgi/intent
\ No newline at end of file