12⁄4
I want to make a concerted effort to learn albanian as well as I can before my wedding.
My goal is to put in time every day during one or more of breakfast/morning, lunch, and dinner/night. In order for it to start sticking, I think I need to be able to maintain the habit and have fun.
The textbook follows a typical intro language course. eg it starts with some basic phrases, introduces countries/nationalities, introduces “to be” present tense and all of the conjugations etc.
Evidentally, I was not able to build a routine centered around this textbook. The table breaking down the phonetics was good though and helped me have confidence that reading wouldn’t be a complete waste of time / actively harmful in teaching me the wrong way of pronouncing things. I got through a few lessons but the subject matter was immensely boring and I disliked how unrealistic and robotic the whole exercise was. For example you are given a table of all of the nationalities and languages and you fill out dozens of sentences substituting different characters / subjects in the same template. This didn’t seem like it was advancing me on my goal of understanding what Marsela’s family and friends were saying so it was hard to stay motivated.
Was unable to find a single movies or show in albanian. Only found a site which had albanian subtitles to english language movies. For me I don’t think I could get value from these. I don’t seem to have the ability to map the words in unfamiliar language to the spoken english. So not useful for me.
Also tried looking for films made by albanians but was unable to find somewhere to purchase or stream any of the ones I found.
Apparently there are ZERO albanian language books on Kindle or Audible. Spent much time looking for any ebooks and was unsuccessful. I even found the webpages - indexed by google - that were supposedly for “Best Albanian Books” and it was a broken looking blank page. Searching within the albanian books category similarly yields “There are no books in this category”.
Around the time I got the textbook, I also had the idea that it would be fun to have the goal of reading books completely in albanian. At the time I thought it would be a good supplement to the textbook learning and that at some point I might be able to switch over and read a book. I got one childrens book (like small child) and the first two Harry Potter books. The thought was that I would graduate to each book and be able to read it fairly casually at some point. But since I fell out of the workbook, I never got to the level I needed to read the books.
I still think learning through reading is a good idea. I think for me it is a great way to acquire vocab in a natural way that lets me go at my own pace. I think initially, spoken albanian feels out of reach because I just can’t recognize enough vocab to make any kind of guess at what is being said. It comes at me too fast with such a low fraction of words understood that there is just no chance to infer what an unknown word might mean. With reading, I have a chance to stop at every word, wonder what it means, consult a translation, and form a memory of the context in which the word was being used.
So four days ago I dusted off my copy of Harry Potter dhe Guri Filozofal and have been going word by word through the book using the following procedure:
Between step 1 and 2 I try to remember whether I have seen the word before and predict whether I will find it in the index. Usually I can correctly recognize words as those I’ve seen before. And if I do think I’ve seen the word, I rack my brains to try to come up with the translation before seeing it in the list.
When confronting each word, I also make an effort to say the word aloud (both when reading it as well as when I pair it with the translation). I also try to feel it effecting the adjacent words to try to build an intuition for the grammar.
It is a slow process, but that’s good because it has many opportunities to experience the words. So far I am only almost done with the first page! Repeat I started this four days ago and have spend close to 4 hours doing this exercise.
I imagine that as time progresses and I encounter common words repeatedly I will know them well enough to skip steps 2-5 simply recognizing them and moving on to the next. Even on the first page, there are words like që and të that are so ubiquitous that I have seen each repeately already. Over time I will go from reading a page every four days to a page a day to four pages a day etc.
Still I think this process has some slow parts that oughtn’t be slow. The main one is the step of seeing the word, going to my phone and getting the translation. In practice it can take north of 30 seconds per word given that I need to switch between book and phone, keep my place in the book, type in the word (which is harder when using the albanian keyboard with slightly altered layout or long pressing on the english keyboard). At 300 words per page with 15-30 seconds per lookup, that is 75-150 minutes per page of waste! Certainly some of that time is building a relationship with the word but I think I would rather spend that time intentionally and not jumping through hoops.
This setup also involves three devices (phone for translate, book for source material, and notebook for recording). These are three physical objects that I almost literally need to juggle in order to make it work. This is slow as a matter of fact and also not portable. I can’t take this with me on the go or switch to during in-between times while working.
In order to learn as much as quickly as I humanly can, I need to optimize the loop I outlined above and maybe I’ll also expand the situations I can perform it at the same time.
I need to get this book in digital form. It’s important it is this book as opposed to some random online text for two reasons I thinks:
Unfortunately as I mentioned earlier, there are no albanian eBooks. At least that I can find. So strike one. I will need to find a way around this. Unfortunately part deux - it is quite cumbersome to get translations in ios. For one, albanian is not a language that iOS translate supports so I need to use google translate. This also means that if I were to use the Kindle reading app, they also don’t support albanian to english translate. What’s more, the translate function is burried in both circumstances. In iOS, you need to highlight, wait for the menu to show then tap to see more options, then press translate, then select a language, oh yeah no albanian, so actually need to copy and paste and oof. For kindle it is not much better. There is also no albanian available but to even get to it you need to swipe twice past dictionary and wikipedia before getting to the translate modal. That sucks big time.
But hmm what did we scroll past? The dictionary seems promising. You seem to be able to select a dictionary to look words up from but sadly again no albanian to english option. At the bottom it seems like you can add custom dictionaries somehow. I tried searching on amazon / kindle for an albanian english dictionary but I’m not convinced that the paltry dictionary options I saw would be in the right format to integrate. And what if they were missing words that I wanted to know?
By now I’m getting a foggy vision of how I could make my albanian learning much much smoother but there are some significant hurdles in the way. To recap:
But I still see the potential. With an ebook copy of the book I’m reading along with a complete dictionary, the entire looking up of new words would be reduced to clicking on the word in kindle and reading the translation. Call it 2 seconds for a savings of 80-90% of the time looking up words. Too good to resist. If I could come up with a solution in a few hours, the savings would pay for themselves in a few days.
If I can’t find an ebook copy to buy, I’ll just have to make my own.
I have a hard copy on my desk. It has roughly 250 pages. Looking in to how google books digitized books back in the day, it seems like they had a special camera built for their purposes and made a machine that could flip through the pages, taking pictures of each one. According to the Google Books wikipedia article, they were clocking in at 300 pages per 40 minutes when they initially started, increasing to 1000/hr later. Seems kind of weak sauce. At a rate of on page snapped every 4 seconds, you achieve a rate of 900/hr. And four seconds seems plenty. Then they had some complicated system to digitally flatten the pages. This mattered for them since they were going to digitize everything about the page (images, typeface, layout etc) whereas all I really care about is the text and simple formatting. So I shouldn’t have to get crazy with transforming the images. I should be able to collect all the images in 20 min or less. I’ll report back when I go for it…
With the pages digitized, I will need to extract the text using some method for OCR. OSX has built in character recognition but not sure about how good it is or how easy to program with. Looking at other options, it seems like there are a couple LLM’s on huggingface for ocr but it seems like it take some effort to get it working and since I am gpu poor I’m going to hold off on that for now. I also had a poke around for “open google book scanning github” type queries looking for what people are doing. The main options I found are a bit outdated and look like they could be a pain. They also put a lot of emphasis on preprocessing the images to align them, scale them, unfold them, etc. Steps that I imagine are laborious and are strictly uneccesary for this project.
I remember there being somem sort of OCR available as a service on GCP and briefly looked into using that but I could only find their “Document AI” thing and that seemed more oriented around getting structured data out of documents and it seemed complicated to use. At that point I wondered if I could just give the image to chatgpt and have it do it. So I chucked it into 4o and it did a decent job. I also otried 4o-mini and they had similar results but both missed some things. I went over to GCP land and tried gemini-flash-002 and it was also pretty good. gemini-1.5-pro-002 was even better and based on my cursory glance didn’t have any issues. I did some napkin math and figured that based on the token cost of transcribing each image (a few hundred input and a few hundred outut) that all in to do the whole book with flash or pro it would cost only pennies.. And implementation would be extremely simple, just calling the api with each image.
So I think using gemini will be the best route to doing the OCR. It even has added benefits that it can intuit what to do with the dropcaps, filter out the page number, preserve new paragraph tabs, combine hyphenated words. All with a few words in the prompt. Yeah that is going to be so much easier and higher quality than basic OCR would give.
So assuming I can grab the formatted text for each page of the book, I next need to get it on to the Kindle. Poking arround I found that you can use the “send to kindle” feature to drag and drop a file and have it appear in your kindle library. There are a bunch of supported formats but some are proprietary and some don’t support reflowing text. .rtf seems to be a decent option since it is simple and will enable me to have a decent experience on my phone. It should be pretty straightforward to put all of the pages together into a single file using a script and then upload the resulting .rtf to kindle.
To test this out, I copied what gemini gave me to a new .rtf file I made quickly and uploaded it. Sure enough I’m able to view the “book” in my kindle app on my phone. And I confirmed I can use the dictionary feature!
At this point, I need to ensure that if I go about digitizing the book that I will be able to get a dictionary in the app as well otherwise what is the point? So next steps are to read the kdp creating dictionaries guide. At first glance it seems like it could be doable but is a bit more intricate. But some cool additional features are that I’ll be able to have definition / translation but also examples.
For my dictionary, I want to ensure that all of the words in the book get translations. This should be doable by splitting up my transcribed book into words and then hitting some translation service for a translation. Google Cloud translation seems like an option. Using their NMT variant, the cost is $20/million characters. That’s not insane but I’d like to also see how gemini-pro fares since I think it could be cheaper and possibly have greater contextual awareness based on the surrounding sentence. If the translations for the same word in multiple sentences differ, I could represent that in my dictionary.
Having a corpus of albanian text will also allow me to compute some statistics about which words are most common as well as which words coocur most frequently. Having a list of common words could be good to study early on.
12⁄7
Kindle is not letting me send my epub to my kindle even though it previews in Kindle Preview 3. They also don’t let you send the mobi format even though that is what they suggest you sideload. This lame challenge has me thinking that I should ditch kindle for my own app. The point of having a workflow that works with kindle would be to easily apply it to new kindle books but there are no albanian kindle books soo…
note to self:
12⁄7
Stats:
Something spooky going on. Tried running the same prompt on vertex-ai as I did at the beginning of the investigation and the results were garbage. It just made up some random text. Super strange? Were they actually handing me off to a next gen model before and stopped that test? Are they switching me to weaker model now due to high traffic?
I removed the part of prompt that tells the model that the page is in Albanian and it caused the response to stop even being albanian looking. Something is up. In the ui they show a thumbnail version of the image and it got me thinking that maybe the model is only getting access to a really scaled down version where no text is legible. I opened the “Get Code” panel and saw that the base64 encoded image in there is only 9k char long. Wayy smaller than my image. Looking at the size of the thumbnail it roughly matches. So potentially a bug in their ux they introduced recently. I’ll try to proceed with using the api with the large image to see if it works like before. It also has me wondering if the miraculously low token counts it was showing were based on a thumbnail sized image… hope not.
<coding everything up…>
12⁄14
I was right about the image size issue and was able to write a script to transcribe all of the images using gemini. After compilling all of the pages together, I noticed that the results were not as good as I’d hoped. There were lots of inconsistencies for how the transcription handled text wrapping and almost all of the pages featuring a partial view of the neiboring page had sections of text that included the marginal text despite instructing the model to ignore them. If it were just the problem of the secondary pages being transcribed, I probably would have reshot being more careful to ensure only the primary page was visible, but gemini also had a hard time consistently following the wrapping instructions I’d given it. I might have been able to help it out with an example page formatted correctly, but I thought it would be worth trying out gpt-4o. I added a function to transcribe with gpt-4o instead and got much better results. It had no issue ignoring the secondary pages and the formatting was much more consistent. It might make slightly more character/word errors, but it is workable and I was able to move forward with the rest of the project.
Side note: it was last weekend that I was doing that and since then, gemini 2 has been released so I might rerun the script with the new model since word on the street is that it is good.
Now that I had a digital copy of the book, I could get to creating my optimal ux for learning albanian by reading.
I wanted to try building the app without react to see how it felt. I didn’t want to have to build jsx/tsx files into js. Saving the file + git push = live was just too fun. One html file, one js file, one css file, so simple and a fresh (to me) way of thinking about how to solve the problems. There were several times I would add a useful new feature in 30 seconds and have it instantly available for real on my phone and that was pretty sweet. And doing some raw innerHTMLing was very simplifying for certain parts. The biggest hickups I had were 1. the css file would get cached on my phone and prevent me from getting the latest on my phone and 2. state management started to get a bit hairy and I didn’t have a good “component” setup so I probably couldn’t make the app too much more complicated before big regrets.
My feature wishlist going in was:
I figured that in order to keep my place, I needed to know which word of the book I was on (if the paragraph was larger than a page I wouldn’t be able to get to the next page) so I decided to index page position using two numbers - the paragraph number (text split by newline) and the word number (p split with ‘ ’ space) within the paragraph. This way, given a paragraph number and word number I could easily know exactly where to start rendering text for the page. The indexing was also cruicial for being able to support any page size and be able to turn the page. To render a page, I go paragraph by paragraph, first trying to fit the whole paragraph on the page, checking for overflow, and adjusting to go word by word if the paragraph overflows. After filling the page with as much text as fits, I know the index of the next page’s paragraph and word number. To turn the page just render the page starting at the index where we left off. Going to the previous page was slightly trickier but involves filling up the page starting with the last paragraph of the page (just before the start of the next page) and prepending preceding paragraphs.
I also created a span around each word identified by it’s paragraph and word number so that I could handle tapping on the word and styling it when selected.
The rest was quite straightforward. I stored the position in localStorage, set up an event handler to catch the click and did some basic styling. Setting up the word cursor was simple, all I had to do was advance the paragraph/word position when you clicked the for/back buttons. For translation, I was able to replicate the requests the google translate extension was sending. I was surprised how fast the translate api was and together with the simple ux, the app is a joy to use because everything is so snappy. Here is a screen recording of the first version.
This version was sick but using it I already had a bunch of ideas for things I wanted to add. I realized tapping on the word each time was prone to fat fingering and was impossible to one hand so I wanted a next word button that would always be under my thumb ready to tap tap tap. I also sometimes wanted to stop showing a word in the corner when I wanted to focus. Sometimes the translations seemed off - either the one word I was showing from the translation wasn’t good or it didn’t make sense within the context. I found myself trying to copy the word and go to google translate to go deeper. So I wanted to add a copy button as well that would copy the selected word to my clipboard.
After playing around with this version, it definitely felt better but I had laid out the buttons vertically on the right, the touch targets were a bit too small, and the next word button was too close to the next page button so I could easily accidentaly skip. (I could be pressing next word a hundred times per page. If the error rate on my taps was 1% I could expect to mess up and annoyingly skip the page on every page. unacceptable).
Copy was useful but I also felt that I could skip going to google translate if the translation had more context. So with this feedback, I did some reorganization of the buttons, and added a feature to translate text when you selected it. This ended up also being really handy for seeing what a whole sentence meant rather than just each individual word. This is a feature that would not have been possible with the kindle approach. Overall I think what I have now is many times better than what I could have had with the kindle approach and was probably easier to make. There are also a dozen new features that I would like to add and might in the future to keep track of words you’ve learning, turn them into flashcards and exercises etc.
Here is the version I have now.
Try it here
Feature wishlist