Xdxf Dictionaries

Xdxf Dictionaries Online
Xdxf Dictionaries English

I've added a conversion from KindleUnpack'ed book.html to xdxf- and Stardict- and pocketbook-format to the pocketbookdic script on github. I've only converted one dictionary with it, so feel free to test and post issues.
XDXF is a project to unite all existing open dictionaries and provide both users and developers with universal XML-based format, convertible from and to other popular formats like Mova, PtkDic, StarDict, ABBYY Lingvo etc.

Fora Dictionary Pro is a dictionary viewer.

Features

Available dictionaries As of December 15, 2006 the XDXF project repository contains 615 dictionaries, which are collectively 936,189,613 bytes in size (compressed) and contain 24,804,355 articles.

Fast and fully offline operation
Compatibility with StarDict, DSL, XDXF, Dictd, and TSV/Plain dictionaries
Typing friendly search with case, diacritics, and punctuation tolerance
Fuzzy search
Full-text search
In-page popup translator
History and bookmarks

Downloads

Platform	Application	Version	Requirements	License	App Store
ForaFX.jar	26.5	Oracle Java 8	Freeware	N/A
ForaDictionaryPro.apk	26.5	Android 4.4 or later	Donationware^[1]
N/A	26.5	iOS/iPadOS 9.0 or later	$4
Updates	Change Log
Checksums	MD5 or SHA-256

1. We request happy users of the app's Android version also purchase it's iOS/iPadOS version using the link above.

Examples

No	Name	Languages	Package	Source
1.	WordNet®	English	WordNet.zip	https://wordnet.princeton.edu
2.	FOLDOC	English	FOLDOC.zip	https://foldoc.org

Disclaimer: The installable packages above are provided for the sole purpose of demonstrating the application's essential viewing functionality. The author of the application hereby rejects any and all liability for the contents of the packages.

Compatibility

Fora Dictionary Pro is compatible with the following dictionary/file types:

StarDict dictionaries (*.idx)
DSL dictionaries (*.dsl)
XDXF dictionaries (*.xdxf)
Dictd dictionaries (*.index)
TSV/Plain dictionaries (*.txt, *.dic)

Setting up dictionaries

Connect your mobile device to your computer.
Copy dictionary files to the app's documents/files folder on device. See Android or iOS help for details.
Select dictionary index file as listed in the compatibility section above (or an archive of it) using 'Import Dictionary' option of the 'Manage' menu.
Attach resource ZIP files (if any) using 'Attach ZIP' option of the dictionary menu. (optional)
Edit properties of dictionary such as visible name using 'Edit Properties' option of the dictionary menu. (optional)
Create full-text search index of dictionary using 'Upgrade' option of the dictionary menu. (optional)
Create profiles to group and organize dictionaries. (optional)

Resource files

Resource files of a dictionary can be put into multiple ZIP files with the following properties:
i) Classic (non-ZIP64) ZIP file type
ii) Flat (no sub-directories) file structure
iii) Maximum 65,535 files per ZIP file
Use 'Attach ZIP' from dictionary menu to copy and attach ZIP files to a dictionary.

Full-text search

The app supports searching full-text of all dictionaries for exact matches. The feature requires one-time upgrading of a dictionary which can take very long time to complete, since every single word anywhere on the dictionary is made searchable during the process.

Synchronizing between devices

Copying/moving dictionaries between devices is done using one of the two available methods:

'Export Dictionary' to a *.zip file on the first device and then 'Import Dictionary' that *.zip file on the second
Copy/move whole '.fora' or '.foradictionary' folder or individual dictionary folders using a file manager

Import from URL

Automated importing of dictionaries from URL is no longer available. Follow the following steps to import dictionaries from URL:
i) On your computer, enter the URL on your web browser and download the file.
ii) Connect your mobile device to your computer and copy the downloaded file to the app's documents/files folder on device. See Android or iOS help for details.
iii) Open the app, select 'Import Dictionary' from 'Manage' menu, select the file, and proceed.

Search types

There are five kinds of searches you can perform on dictionaries.

Regular search: Shows results exactly matching the query.
Extended matching search: Shows results matching the query with case, diacritics, and punctuations ignored. Suggestions include in-phrase and phonetic matches.
Full-text search: Shows list of articles containing exact matches of the query. The scope of search is not limited to headwords and includes all text in all articles (definitions, synonyms, examples, etc).
Fuzzy search: Shows list of articles most similar to the query. The search works like a spell checker for words you're not sure how it's written/spelled.
Wildcard search: Shows list of articles that match the criteria set with a wildcard query.

Troubleshooting and FAQ

Q: I bought an offline dictionary, can I use it in Fora Dictionary Pro?
A: Probably, no. Most standalone dictionaries have it's own proprietary data format incompatible with general-purpose dictionaries. You need dictionary data in one of the supported formats.
Q: Suggestions are slow when a lot of dictionaries are installed.
A: Unlike traditional search auto-completion, the app performs a complete search and matching of the query (e.g. occurrences in multi-word phrases) on all dictionaries in profile when suggestions are activated. Grouping dictionaries into smaller sets based on usage context using profiles may be helpful.
Q: I can't find the dictionary I need or the dictionary isn't displayed properly when searching.
A: Importing the dictionary using the latest version of the app or using alternate formats may be helpful.
Q: I can’t find the '.fora' folder.
A: Probably it is considered hidden. Please use a third-party file manager such as: muCommander
Q: How do I uninstall Fora Dictionary Pro from my Android device completely?
A: The application stores settings and dictionary data on the external storage, hence, you will need to manually delete the '.fora' folder there to clean up after uninstallation.
Q: How do I uninstall Fora Dictionary Pro from my computer completely?
A: The application stores dictionary data and settings separate from it’s app folder. After deleting the app, you must also delete the folder named “.foradictionary” and the file named “.fdconfig” in current user’s home folder (for example, C:Users<username>.foradictionary on Windows or /Users/<username>/.foradictionary on Mac). If you can’t see the folder or file, you can use a file manager, e.g. muCommander.
Q: Why is Fora Dictionary Pro cheaper on Android than iOS/iPadOS?
A: Cost of both developing for the platform and maintaining of an app in the marketplace is different on the two platforms and hence we reflect the difference in pricing the app.

Android 10+

Shared/external storage mechanism known as the SD-card has been retired starting with Android 10. The operating system now enforces all apps use it's sandboxed folder for all it's app data. If you upgrade your device to Android 10+, you may need to copy/move and replace '.fora' folder from your device's shared storage to the app's sandboxed folder (typically Android/data/com.ngc.fora.android/files) using a file manager.

Fora Dictionary

Fora Dictionary Pro is a cross-platform (i.e. working on various operating systems) version of Fora Dictionary and is fully data compatible with it. Auxiliary features such as deep customization and operating system integration are not available yet since the app works across different operating systems. Please note that we have retained and improved all essential dictionary viewer features like various search options and the in-page popup translator.

Known limitations

Slow importing: The app converts every dictionary to it's own cross-platform format and sometimes this conversion takes very long time. Some big dictionaries require this conversion be done on a powerful desktop computer, and some bigger dictionaries take long time even on a powerful desktop computer.
Limited on-device upgrades: The optimization of full-text search upgrade is not available on mobile devices due to cross-platform memory management implementation and greedy nature of the operation. This setup-only limitation doesn't affect any usage features on any device.
No embedded player: The app doesn't play resource files itself, but uses the device's default media player. The default player determines which kinds of files can be played.

For developers

Integration with other apps or the operating system is briefly described here.

EDIT:
This page is obsolete.
The GitHub project got scrapped and offline_dictionary.com replaces it. Check this post instead.
But the technical information below still stands.

There you are, ready to learn lots of nice things.

The project is hosted on my GitHub right there :

It's written in C# so deal with it. I was a Java fanboï too in the past. But then I realized it was unfortunately shit. Plus, now C# is open source so your only argument in favor of Java is gone in pieces. I know because it was the one I used too ;)

Also, I like to troll fanboïs a lot. So calm your tits; code is code regardless of its language.

But still... C# > Java... deal with it （╯°□°）╯︵ (~ .o.)~

Below you will find the big lines of the GitHub project and how it's structured.

Get the offline dictionary

First, you need to know that the dictionary.com's mobile app (here) has an 'offline' option... at least on the Android version... and at least on the 'premium' paid version...

Whatever, now you know how I got the 'source' to create the XDXF.

Notes:

I am using BlueStacks to emulate an Android device, on which I installed the app. Whatever device you will be using, you are gonna need to have a root access as the offline dictionary is in protected folders. So it might be easier for you to grab one of these pre-rooted BlueStacks versions.

Now once you get that app, go to settings ...

dictionary.com app

... and check the offline options.

dictionary.com app settings

Now the app is gonna download the whole dictionary on you device.

You will find it in the following location :

Root Explorer

We are interested in the android-08-08-primary.sqlite file.

You can even open it directly, and see a few definitions from the content_blocks table.

Root Explorer's built-in SQLite Viewer

You need to get that file. You manage.

Me, I copied this file into my BlueStack's shared folder which is on the device:

And the corresponding folder on my host Windows machine is:

Whole databases folder retrieved

So now you got the whole dictionary.com content on your computer.

Get the offline dictionary: hacker version

I wanted to download again the database to make sure that I had everything in order and no damaged file. But instead of doing all of the above I was like, I'm gonna find the URL and download that shit directly.

Not sure why, but Fiddler never showed anything. So, I went deeper and popped Wireshark open.

POST http://app.dictionary.com/mobiledownload/Primary-Android-v45

HEADERS Content-Type: application/x-www-form-urlencoded

That's it. Not sure what this token is. Since I'm using the 'premium' paid version of the app (don't ask...) , maybe this grants a better offline dictionary.

Extract the data from the SQLite database

Everything I'm gonna say here is unconfirmed and is just me doing reverse-engineering the database's structure.

Notes:

I used DB Browser for SQLite to debug queries, to try and understand the database layout, etc.

I'm lazy and I don't feel like explaining the database layout. So I'm just gonna throw a couple screenshots instead.

DB Browser for SQLite

It's just a matter of using the C# System.Data.SQLite.Core nuget package and then loop on each 'word' to retrieve its definitions.

Well, I say that as if that's super easy... it is somehow, but it gets harder to do so fast.

In order to extract all of this stuff, I had to thread the readings from the database.

I built a giant ConcurrentDictionary in which I put stuff.

It still takes around 40 minutes to extract the whole thing, using my 16Gb RAM and 8 Cores.

Visual Studio's Diagnistic Tools show high CPU usage on all cores

Build the XDXF from the extracted data

Almost a much trickier part than extracting the data from the SQLIite, as the definitions from the dictionary.com database contains HTML and even sometimes JavaScript code...

In addition we have encoded HTML entities and broken IMG tags which aren't self closed... WTFdictionary.com! You made my job much hard! Much pain occurred to me ;(

This is pretty bad since most XML parser will complain that it can't parse such a mess.

So, we have to pull this awesome HTML parser out from nuget: HtmlAgilityPack.

This parser guy will read shit HTML and can close unclosed tags, doesn't give a fuck if a <BR> or <IMG> tag is unclosed (because it supports old ass HTML specs), etc.

So I'm parsing with this guy, then I'm converting back the HTML entities to Unicode except for the characters that need to be escaped in XML.

Which are the following :

<rage>

To the other people on the interwebz that say 'meeh there is also the single and double quotes you n00b' will be thrown into blazing fucking fire:

Single and double quotes are to be escaped ONLY in XML attributesNOT in tags!

</rage>

Once the definition has been cleaned up a little bit as described right above my rage rant, we can put it inside a <def> tag which defines a definition in XDXF.

If you know a little bit about XDXF (and you most likely don't), you need to know there are two ways to represent a definition:

Visual
Logical

In 'visual' format, the <def> will contain stuff, and that stuff is the definition content. It most likely will contain the visual representation of the definition.

In 'logical' format, the <def> will be structured with XDXF tags representing different parts of the definition. Like an <ex> to indicate a usage example of the word (instead of italics for instance), or <etm> to indicate the etymology of the stuff, etc.

In this project, I will be using the 'visual' format for the following reasons:

Because the definition itself from dictionary.com is made outta (crappy) HTML, so it's already a visual representation of the definition;
Because it would be too hard to parse this HTML and convert it to a semantical XDXF fragment stripping out all of the visual information;
Because my personal goal here is to be able to convert this XDXF using the Russian's tool so I can enjoy it on my PocketBook, and most likely this little tool will not support the 'logical' format.

The output XDXF looks like something like that

And finally here is the XDXF:

Yeah... it's a pretty big mofo

Download the 7zipped version there:
dictionary.com_5.2.2-08-08.7z
Damn, this guy is too big, and it crashes the the Russian's tool that is supposed to convert XDXF to ABBYY ... crap.
Guess that will be the next episode then. Gotta do this shit by myself.

Performance considerations

The hardest part of this project.
In the current version of the offline database '08-08' there are 149135 word entries.
We need to get their IDs and then to go and grab their definition, plus get the 'similar' words that have the same meaning which are in another table.

Tasks

Doing this in a synchronous way and I guess a couple days would be required.
In an async way though, a good hour is required.
Right now I'm using Task to create the parallel tasks, with one task responsible to build the definition of one word. Which means, that I am creating 149135 tasks :)
...
..
.
'OMFG WTF are u doing!?' you are thinking.
Fear not, the Task class works with a goddamn good task scheduler. Yes I will create 149135 Task objects, but only 8 or 10 will actually run concurrently. All of the other tasks will be marqued as WaitingForActivation.
It's all good right there. A Task object (I guess) only contains a reference to a delegate. Which is like a pointer (I still guess) which is like a Int64 on my 64 bits CPU (I'm still guessing).
So it's prolly like:
149135 * 64 bits = 9544640 bits
=> 1193080 bytes
=> 1165 kb
=> 1.13 mb
It's nothing.
Plus, I clean the tasks list every second to remove done tasks (it's easier to debug that way I have only the remaining stuck tasks)
And BTW I tried using the new Parallel static class. This is shit. my CPU was not working at all. Even after setting a MaxThingy in its configuration to MAXINT. It's just not brutal enough, and was going 4 times slower at least.
Maybe I just don't know how to get the best outta it but anyway I reverted and used Task instead.

Threaded SQLite

To thread the reads from the SQLite we need to open one connection per thread. Too bad we will be suffering lotsa overhead but that's the only way.
Still it's slow. So I tried a couple things to speed the process. However none really worked.
First I moved the SQLite database file to my SSD drive.
This worked well, as before I could see that my CPU was not working 100%. I guess the bottleneck was the I/O in the drive.
Then I tried to move the SQLite database file to a RAM drive. Why the fuck not uh?
I used ImDisk Virtual Driver and copied/pasted the file there. No speed increase but, this will stop fucking my poor SSD. So I still recommend that to save the life-span of your SSD a little bit.
Finally I moved the data to a SQL Express Server. I used the trial version of ESF Database Migration Toolkitto make the migration. But no speed increase either. So there's was no point.

Storing the whole thing

Then we put put all of this in memory... now this is going to take up some space all right. I can't write the stuff directly into the XML because when I get a definition for a word, this may not contain all of its definitions...
Let me explain.
For instance, when we read the definitions for the word 'fame' we get stuff. We also know that 'famed', 'overfamed', etc. also have the same definition as 'fame'.
But, when later I read the definition of 'famed', we get an extra new definition that only relates to the 'famed' adjective. In essence, 'famed' will have its own definition plus the ones from 'fame'.
You can check it out only directly at dictionary.com. Go on, type 'fame' and open another tab and type 'famed'. Now compare both. The word 'famed' outputs 'famed' definition + 'fame' definitions.
So yeah...
With these considerations, I have to store the whole thing in memory and little by little update words definitions with their 'parent's word definitions.
There must be another way, another coding design, but so far I don't see one.

Updating the whole thing

Since multiple threads are messing with the same definitions we need to lock that shit so it's thread safe. For this job I'm using a ConcurrentDictionary with

Xdxf Dictionaries Online

List inside.
Because I store definitions by words, and I add words from different threads, I need ConcurrentDictionary. And because sometimes I update the definitions from different threads too, I also need to protect the definition collection, so I'm using a lock around the List.
So I have tried the SynchronizedCollection vs ConcurrentBag instead of the List. Now I lack knowl-edge and experience in threaded coding in general but I had issues with SynchronizedCollection. These mofos were throwing CollectionChanged exception (or something) sometimes. Which probably means that each atomic operation like Get/Add/etc. is not locked. So I had other threads messing with my collection during a foreach.
But with the ConcurrentBag I never had a single one exception. I guess that's because ConcurrentBag has a locking mechanism per thread. Not only per atomic operation.
Anyway ConcurrentBag was overkill so a simple lock around my List is most likely faster.

Writing the XDXF

That sounds easy but not quite. At first I was like creating a DOM in memory, effectively duplicating the whole structure twice in memory. Actually more than twice since the XML nodes also contains tag strings and shit. I quickly experienced an OutOfMemoryException.

So the only way is to write the XDXF piece by piece while reading the content of our 'whole thing'. That's it.

One bug

There is one. The very first tasks I talked about: they seem to never complete.

But it's not a Task deadlock that is freezing the execution, it's the freaking SQLiteDataReader that is like... waiting.

The weird thing is I had exactly the same issue at work. You know when I do... uh... 'tactical' programming and shit. When I do operator style CQB coding.

Whatever, we had this issue with sometimes an SQL query locking itself.

We are using Entity Framework 6 and out of desperation of finding nothing online nor by ourselves, I grabbed the EF source code and deep debugged our shit. Putting breakpoints all around inside EF. This shit was insane. When you go in there man, it looks like nothing. There are Invoke and delegates everywhere, oh man. My ass still hurts thinking about that.

But after 10 hours of deep ass debugging, I ended up on the freaking DataReader doing nothing. Waiting. Using no CPU. No memory. The mofo was just chilling.

What's interesting is that I found something on the interwebz. People taling about the freaking DataReader that is Lazy. Like, unless you try to evaluate the thing linked to the reader, nothing is happening.
Makes sense. Moreover, I was using IEnumerable to try and optimize the readings from the SQLite database. So it could be... that somewhere in my foreach loops, somehow an iterator is getting lost along the way, which means that one item will never be evaluated. Which mean that the reader will never try to read. Because no one needs the data.
That was a very interesting theory. Unfortunately, I ended up testing with ToList() everywhere in the code, making sure that everything would be evaluated. And the bug was still there. Still waiting chilling for the last 5 tasks.

This is the usage at the 149133th word. It's been like that forever.

And, the surprising thing is that it doesn't crash. Nope. If I put a break-point after the reader, and wait for the break, and check the value returned by this guy, it's valid. There is actual valid data in there. Nothing fancy, nothing huge, just the usual definition in there.
Very weird.
So even though the extraction from the database is around 30/40 mins, it can last up to 3 hours just because these last 5 freaking tasks are chilling.
Which is still better than doing this shit synchronously...

Solved

Not sure how though. I started to remove the completed tasks from my huge tasks list. I do this from an 'update' task that is in a while(true)

and shows the progression. Even second I RemoveAll() the completed tasks.
Also I dropped the ConcurrentBag and used a simple lock.
Those are the two actions I did, and now it completes fine.

Xdxf Dictionaries English

Licence

The license for 'DictionaryDotComToXdxf' is the WTFPL: Do What the Fuck You Want to Public License.