Introducing Eritrean Languages to Optical Character Recognition Technology

By Sabrina Solomon

Our guest today is Issayas Tesfamariam, an Eritrean living in the US. Issayas has discussed a range of topics with us including where our languages are heading in the 21st century vis-à-vis the advancement of technology and the digital world. The following is recap of the interview.

*  *  *

 

Please tell us about your career

I currently live in the US. I received my graduate degree in Asian Pacific studies. I work at a research center in Stanford University at this time I also work as a logistics coordinator at the largest private archival collection in the US that collects materials from the Horn of Africa. On top of these, I have been working as a coordinator at the National Public Diplomacy Group for around a year and a half.

 

I had a blog for about ten years which has now grown into a website called Kemey.com. The website mainly focuses on old documentaries that I made a research about the Eritrean history, places and cites, people and more. I have also written a book in Tigrinya, where college students who are keen to learning the language can use before the last year of graduating across the US.

 

How did it all start?

After the computer science students finish their fifth year and prepare a senior paper, it’s only told on the news or just reported but no one really understood the impact and weight of those papers. In collaboration with the University of California Los Angeles, UCLA, we decided to pay more focus and attention to it. The initiation was taken by the California University and professionals like Dr. Biniam (Google Tigrigna translator), Yemane Russom (the early Ge’ez font creator) and many others had a symposium focusing on the “Tigrigna digital initiative” that took four days. Later, call for papers was done where abstracts on what the students have done before, finished and unfinished projects were submitted. I then cooperated with the students here where many were able to participate from this country. And we had a follow up for those who were unable to participate due to internet problems with Dr. Gideon. Many other PhD graduates from other countries also participated. The students of this country were able to send their works through recorded videos with the help of the UN on uploading it. Questions and answers where then done through zoom.

 

Any extraordinary works from the Eritrean students and graduates?

 

Many creations and works were presented. One of the greatest papers presented was the program that transcribes voice to text called LETAI. Just like SIRI and ALEXA, LETAI was used as an acronym for listening, enhancing, transcribing artificial intelligence where it transcribes everything that you input through voice in Tigrigna. There was no error in the transcription at all as a result of the effectively uploaded Tigrigna data in the system. The only problem they have is that the computer processors they use aren’t as powerful to store or upload all the data. So LETAI doesn’t transcribe while you’re talking like the other programs. It needs a powerful computer for the program to transcribe while you’re speaking, but for LETAI, you have to record what you speak and transcribe it instead of directly getting the text. They used a collective of open sources and created their own. People who have stories to tell but can’t write can now use LETAI and an entire book can be written. That means there won’t be a story that’s going to vanish without being documented.

 

What difference did you notice between the students across the other countries compared to ours?

There is a huge difference between the students here and there. The students here are way headed even though there is limited equipment and electricity interruptions. The students or graduates here usually focus on helping the deaf or blind (disabled people in general) and commercials afterwards, so basically everything is incorporated when they’re designing the tool. In contrary to that, the aim of the other countries PhDs concentrates more on the commercials and then they keep adding on after complaints from such communities which basically is an afterthought, the basics of the foundation doesn’t apply for all.

 

What exactly are the students doing for the advantage of our language?

They’re basically teaching the machine Tigrigna language with programs like OCR (optical characters reader) and Tigrigna sentimental analysis. The Tigrigna sentimental analysis for instance is a program where the machine explains if you’re using positive or negative statements or portraying such while you write. There needs to be a lot of data input though. For instance, there is a corpus of around 100 million words of spoken English. Corpus of 40,000 Tigrigna words were made by Dr. Yamane Keleta who is an advisor and coordinator of the computer science students who prepare a senior paper. It’s not much compared to the English language but it’s not bad to begin with. There are around 100,000 words now. Scanning newspapers from the Ministry of Information and from the research and documentation makes the corpus bigger and that’s how the programmers made it too.

Even though computer science is technical, Tigrigna language is supposed to be inputted by the linguists of all the sematic languages like the Ge’ez and Tigrigna. Because those students are only programmers not linguists. So, one thing that Dr. Gideon did while he was here is that he gathered artists and writers and people who made digital electronic dictionaries for a more common and precise language. Standardization is really important when it comes to language.

We’re trying to have a standard Tigrigna language which is going to be applicable in the society. The naming, usage of reversible phrase words and written words all need standardization. Until the standard and correct way of writing of each and every single Tigrigna word is given, it should keep going like this as it’s the best way for now.

 

What’s the main question that you are asking now?

Where’s Tigrigna in the 21st century when it comes to technology? Are we behind, are we ahead or in parallel? We have to explore to know this. Who’s doing what and where? We have some idea where it is now. The important thing is that there are around 40 cohort with around 5 members each. Now how can we reinforce it and take it to another level? Because they’re doing well in the artificial intelligence, they only need equipment and materials.

742 books were printed which is one terabyte data that you can get an information from. The Tigrigna newspaper was also scanned from the year 2000 which can also be a great data and a data booster.

The youth have understood the idea the westerners have reached, now they just need the virtual and maybe they can add to it. There’s indeed a lot that needs to be added to the data like more corpus and things to modify it like accents. But we’re still on the right path.

 

Is there anything that you’d like to add?

If we don’t tell our stories, we’ll be boxed in. And it’s hard to get out once you’re boxed in. Because you get boxed in based on a lie, and you have to debunk the lie. What we need to do is tell our history so that our generation continues, our collective memory’s chain shouldn’t be broken. And when we tell our story, our enemies are the once that are going to be boxed.

We have comparative advantage in the archeology, especially marine archeology. It is part of our history and it’s our power. We should pass it on for generations and never break the transmission as it’s our identity. We are now embarking on this work with the Tigrinya language, but in the near future we need to include other eritrean languages.

Also, network between the youth is really important especially networking through professions. The linkage should be smooth so we can have a smooth transition because nothing can be done half hazardly. And there should be cohesion so we could leave a signature. The best is yet to come.

Thank you.

 

Thank you!