Skip to content

Recent Articles


Ensuring directory structure with JSON Schema

It is sometimes difficult to keep a folder tidy and well-organized, especially if many people happen to write in that folder. Recently, we’ve seen this issue really growing with our designs folder, where it took minutes to find a specific PSD, if found at all, and there was no way of knowing if it was actually the latest version.

In order to deal with this issue, all the people working with those files agreed on a directory structure, and cleaned up the old directory. However, once this was done, there was two needs emerging: making sure that everyone understood the same thing (spoiler: they didn’t) and making sure everything stays tidy.

As JSLint taught me about how to keep Javascript clean, I figured I’d need an automatic validator in order to keep my directory well-organized. Humans make mistakes, computers only make the mistakes they’re programmed to do. That’s why I tasked myself with writing a small program that would check the directory structure, and scream heresy on the team’s Slack channel.

Along the way, I also figured that writing a validation program is boring, and if I want to make it configurable for use with other folders, it’s going to be a pain in the ass. However, validating arrays and maps is the specialty of JSON Schema, which is by the way really awesome, and here’s some understandable documentation if you want.

So was created filemarx (“file marks” but with a reference to communsim’s standardization), which can simply be installed through pip and then used as explained on Github’s README.

That’s it for today comrades, I assure you that you can use this script and still be loved by your colleagues!


ActivKonnect’s Django Template

After a few projects at ActivKonnect, it turns out that Django projects often come up with similitudes in their needs, since I keep my fetiches from project to project: bower, LESS, Bootstrap, authentication with AllAuth and so on.

I also started to create my own directory and configuration structure, inspired from Two Scoops Of Django‘s template and other things I’ve seen around but not taken note of.

Moreover, I have need of my own, such as automated deployment thanks to previously-written Ansible playbooks, which more or less assume a 12-factors app, meaning mostly that media files are hosted on Amazon S3 behind a CloudFront CDN. Not that I particularly need a CDN, but I find it cheaper and safer to do it this way than simply hosting the files myself.

Anyways, let’s not forget another requirement: Python 3 compatibility. I’m getting really frustrated with Python 2 now that I started to taste Python 3′s sweetness (Unicode handling, more useful functions in the standard library, …), and I really wanted to be able to use Python 3 on my projects. The main blocking point was Django Storages which doesn’t seem to handle Python 3. However, Boto does, and so I vendored-in the s3boto storage in the template instead of installing it through PIP, waiting for my PR to be accepted.

So, no more talking. You can find the template on GitHub, and the README will explain how to use it.

Feel free to leave me a note if you use it and can think of an improvement!


FrigoTunnel Protocol

In some project, I have to remotely emit sounds on Raspberry Pis that are networked through Wifi, using a tablet application. By example, at a button push, you need to hear sound A from node 1 and 2, and sound B from node 3.

Obviously, you don’t want to be spending time giving IP addresses to things, or log into the Raspis in SSH to set up things and so on. For those reasons, I came up with a very simple network protocol that will detect all nodes on the same network and send them messages automatically based on the name of each host. Host names are then put in configuration files, and all I have to do is plug the raspis, wait for their wifi to be up and enjoy.

Then is born FrigoTunnel, a simple Qt/C++ library that implements that protocol of mine, and runs quite simply in my app. Technical explanations are available in the project’s README.

By the way, it requires Qt5 to run, and is a bit hard to get off on Raspbian. I installed Arch instead, which I found much more up to date, light and practical for raspi hacking. And it has systemd, which makes daemonification super-easy, which is also a great advantage.


Programmatically filling a PDF form (and list fields)

For some reason, I ended up having to fill up a form inside a PDF, from a Django app. Initially I thought it would be an easy thing to do, but unfortunately not so much. Indeed, all I found was some StackOverflow question or a Django app, but none of them are able to list the fields inside the PDF, which is essential to my app which has to be administrated by non-tech users and using PDFs that weren’t generated by them. Plus, all those solutions rely on pdftk, which comes with an awful list of dependencies and I don’t really understand how it works.

So I looked to other languages for help, and apparently there isn’t many options around. Java has PDFBox and C/C++ has Poppler. Since Java is a bit heavy to deploy, I felt like C++ was a simpler solution. And thus, pdf-form-filler was born!

It is a very simple tool, that accomplishes two tasks: list the fields in a PDF, and fill them. Currently, only text fields are handled, but that’s enough for my needs. The goal was to interface it from Python, so it produces JSON output and expects JSON data, as explained in the project’s readme.

So far so good, I just hope I’ll never have to understand how checkboxes work.


Doing iframe resize

As it happens, I had to integrate some external applications through iframes, and I re-discovered how much this sucked.

There is a lot of scripts out there, but I couldn’t find one that works for me, so basically I had to write my own. Let me introduce you wrapframe, which aims at a seamless integration of an iframe into a document. It’s only an aim though, I don’t think it is ever going to be possible to do it right.

You can get the project scope and many other things on the project’s README, however there is a few details I’d like to discuss here.

Watching the size of an element

In many cases you might have to watch the size of something, in order to react in case its size changes. Like change the zoom on a map depending on its own size, or in the present case get the size of an iframe’s content.

As all developers have been taught, polling is bad and should be avoided at all costs. Events are much better. Only here, there is just so many events, you can’t really track them down. The window can be resized, a dom element might be added/removed, a style might change, and so on. The possible changes are endless.

So what I advise is the simplest and the most fool-proof solution: just fucking poll. Watch the size of your thing every 1/60th of second, and if you notice any change just trigger your stuff.

Another variant which can be relevant in some cases (like when it is heavy to do a resize), you can wait for about 200ms that the size stabilizes itself, and then trigger your resize actions. This helps not killing the CPU when the user resizes the window, by example.

Get the size and position of a DOM node

There is a lot of things you see, with the scrollHeight, offsetHeight and so on. Although those have their use, they are clearly messy and deceptive. Moreover, you won’t really get the position of your element with them.

Of course, you could use jQuery, but it’s not always available to save you. Or maybe you just find it too slow.

A solution exists: there is the getBoundingClientRect() method. It exists on all browsers, behaves consistently and brings no surprises along. And it’s pretty fast.

Get the size of a page

It’s a fucking mess. Most if it is detailed in the project’s readme. I’ll just drop a few words here.

Basically, you have to know pretty well how your page works and what you’re going to do with it. Then you will know which elements are representative of your page height, and then you will know what to measure.

Cross-domain communcation

Not much to be said. The top solutions out there are

  • Hash-based
  • Flash-based
  • postMessage() based
  • hybrid

And basically, postMessage() is available in all current browsers, so it’s the only viable solution. No need to use a bloated hybrid approach, nor infamous hacks on the URL hash.


I hate Internet Explorer. Really, from the bottom of my heart. I hate it.

Appart from that, there is a lot of tricks floating around, but it has been a real pain to sort them out and find which one were usefuls. To sum-up

  • Cross-domain communication is neatly done with postMessage()
  • Getting the size and position of an element is really easy using getBoundingClientRect()
  • There is no way to see if an element’s size changed appart from polling it
  • The method to get a page’s height depends on its content and flow

Finding all that and trying all the other solutions have taken me quite a bit of time, don’t make the same mistakes as I did :)


Sorting JSON fields in PostgreSQL

UPDATE: PostgreSQL 9.4 comes with the new JSONB type. It is better, stronger, faster and can be sorted. Just use that if possible.

PostgreSQL in its version 9.3 comes with a new data type, the JSON fields. If I had to give my opinion on it, to stay objective and moderate I’d say it is fucking awesome. A lot of other blog posts and documentation exist on the matter, so I won’t be covering this too much, but let’s just say that it enables you to make queries deep inside the JSON, like

SELECT * FROM mytable WHERE myjsonfield->>'title' = 'Look For This Title';

Now the only issue with that is that if like me you use Django, it is not yet fully compatible with PostgreSQL advanced features. Indeed, since it tries to provide a uniform set of features across all DB engines, its features are bound to what MySQL can do, aka barely key/value storage. Although it is in my opinion a huge mistake, there is currently an effort to properly support PostgreSQL in Django, and this is an awesome news.

However, in the meantime we’re back to using hacks, and one of these hacks is to use django-jsonfields, which unfortunately isn’t bullet proof. And in particular, if you happen to do a .distinct() on a queryset that selects a JSON field, you will be nicely warned by PostgreSQL that you can’t fucking compare two JSON fields, and thus it is impossible to know which values are distinct. Actually, I don’t even know why the ORM is bothering to do it this way, since all that needs to be unique is the ID, but nevermind. Changing the way the ORM works is just a pain in the arse, and I really wanted to avoid it.

And there comes this StackOverflow thread. It shows how to add the support for a comparison operator to the JSON field, which is just great. However, the given implementation suffers several issues

  • An expression like SELECT '{"a":1,"b":2}'::json = '{"b":2,"a":1}'::json returns false, because it is based on the string representation. This was a great idea, but unfortunately there is no way to make JSON.stringify() behave in a deterministic manner
  • It does not allow for comparison, but only for hashing. If the field stands alone this is not an issue, but if you start to mix it with other fields for your .distinct(), PostgreSQL will start to want it to be sortable. And since it isn’t, you will get a nice fail message.

So, the concept was good, but the implementation had to be different. Then I decided to create a json_cmp() function that would be able to compare (in the sense lower/greater than) two JSON objects, and that would power all the operators required for a b-tree operator class.

Because JSON is Javascript, I decided to code the json_cmp() function in Javascript, which requires to activate the PL/V8 extension.


Then, you need to create the following bunch of objects

CREATE OR REPLACE FUNCTION json_cmp(left json, right json)
RETURNS integer AS $$
    function cleverType(obj) {
        var type = typeof obj;

        if (type === 'object') {
            if (obj === null) {
                type = 'null';
            } else if (obj instanceof Array) {
                type = 'array';

        return type;

    function cmp(left, right) {
        var leftType = cleverType(left),
            rightType = cleverType(right),
            output = 0;

        if (leftType !== rightType) {
            output = leftType.localeCompare(rightType);
        } else if (leftType === 'number'
                || leftType === 'boolean'
                || leftType === 'string') {
            if (left < right) {
                output = -1;
            } else if (left > right) {
                output = 1;
            } else {
                output = 0;
        } else if (leftType === 'array') {
            if (left.length !== right.length) {
                output = cmp(left.length, right.length);
            } else {
                for (i = 0; i < left.length; i += 1) {
                    buf = cmp(left[i], right[i]);

                    if (buf !== 0) {
                        output = buf;
        } else if (leftType === 'object') {
            leftKeys = Object.keys(left);
            rightKeys = Object.keys(right);

            if (leftKeys.length !== rightKeys.length) {
                buf = cmp(leftKeys, rightKeys);
            } else {
                buf = cmp(leftKeys.length, rightKeys.length);

            if (buf !== 0) {
                output = buf;
            } else {
                for (i = 0; i < leftKeys.length; i += 1) {
                    buf = cmp(left[leftKeys[i]], right[leftKeys[i]]);

                    if (buf !== 0) {
                        output = buf;

        return output;

    return cmp(left, right);

CREATE OR REPLACE FUNCTION json_eq(json, json)
    SELECT json_cmp($1, $2) = 0;

CREATE OR REPLACE FUNCTION json_lt(json, json)
    SELECT json_cmp($1, $2) < 0;

CREATE OR REPLACE FUNCTION json_lte(json, json)
    SELECT json_cmp($1, $2) <= 0;

CREATE OR REPLACE FUNCTION json_gt(json, json)
    SELECT json_cmp($1, $2) > 0;

CREATE OR REPLACE FUNCTION json_gte(json, json)
    SELECT json_cmp($1, $2) >= 0;


   OPERATOR 1 <,
   OPERATOR 2 <=,
   OPERATOR 3 =,
   OPERATOR 4 >=,
   OPERATOR 5 >,
   FUNCTION 1 json_cmp(json, json);

As you can see, the json_cmp() function tries to make somewhat meaningful comparisons, but of course this can’t always be the case. What matters is that it is totally deterministic and won’t return 0 unless the compared objects are strictly equals. Please note that the underlying JS function is tailored for JSON and V8, it will probably not be suitable for a general-purpose deep object comparison.

Also, be warned that the comparison seems to be somewhat slow if your JSON objects are big and nested, but it actually seems quite logical.

After creating those functions and operators, the following concept snippets should work:

-- Compare JSON fields (returns true)
SELECT '1'::json < '2'::json;

-- Sort a table according to a JSON field
SELECT * FROM mytable ORDER BY myjsonfield;

Please note that if you are a Django South user, you can create an empty migration for your app (./ schemamigration myapp install_json_comparison --empty), and just replace the body of the forwards() method by something in the mood:

    def forwards(self, orm):
            -- All the SQL shown above

And voilà, with this Django’s ORM should be able to use your table as it pleases him!

Anyway, I am really amazed by PostgreSQL, which proves to be really easy to extend. In order to remotely approach this result using MySQL, you would need to spend days writing a C extension, whereas there you can just stack up your little piece of Javascript on top of already awesome components, and this makes the magic. I think I’m in love <3


Gibi ― A random word generator

I have recently been generating a fake world map, with fake cities and so on. As this was dull and boring, I searched something to improve my day, and finally remembered this Daily WTF post. It might not have been a great success for the other guy, but anyway it’s still funny to do.

So I wrote a few Python lines, got the list of french cities and started the fun.

A Gibi from « Les Shadoks » :)

A Gibi from « Les Shadoks » :)

After a bit experimenting and turning around, I came up with the gibi python package. The installation is pretty straightforward

pip install gibi

Then all you have to do is to use the gibi command to either generate your markov chain matrix, either generate random words. It turns out that a few things do improve the result

  • Filter out the words too short or too long (within [3, 30] character long seems nice)
  • Do not only consider the last character when generating the output, but the last 3 characters. The number can vary, but I find that the results with 3 are the best.

You can also use it as an API, as shown on Github.

Another bonus feature: you can seed the PRNG, meaning that you can produce a deterministic result given an input. Which helps me to generate always the same city names across my runs. It could also become some way of hashing stuff, especially in use with a cryptographically stronger generator like Skein, although I seriously doubt the security strength as well as the use for this technique.

A few words, for the end:

% gibi generate -n 15 french_cities.gibi

This all sounds pretty useless, but anyway it was fun!


Thumbor and Heroku

Since I am currently working on Good Morning Planet, and that we will handle a lot of user pictures, that will have to be resized, came the question of how to generate thumbnails.

As always, probably due to my inner nature, I prefer lazy solutions. In this case, what I will be using is Thumbor, which is an awesomely incredible thumbnails generator. It will detect faces and features in pictures before cropping, and will generate stuff on the fly.

I’ll let you see the details, but in short it’s exactly what I intended to code from scratch, only better.

Now I want to use this on Heroku. But the smart cropping feature requires the OpenCV Python module, which is not bundled in the default Heroku stack, and not available through pip, because it’s a real hell to compile.

So, what I did instead was to create a custom buildpack: Jetpack. It is designed to be modular, though essentially to build my Python/Django apps.

On top of this, I have created a dedicated “prepack” (see Jetpack’s readme) and put up a custom configuration (also explained in its own readme).

Overall, the Jetpack provides a way to run Python + OpenCV, and the Thumbor Heroku configuration provides inspiration if you ever want to deploy Thumbor on Heroku. Please note that this has never been tested in production, even though this will be done soon!

Happy hacking :)


URL regular expression

Well, yes it’s old as the world, but whatever it’s still being useful.

My issue here is to match an URL from a text, in order to auto-replace it with a link (say in a messaging application). I try by no mean to validate it!

What you’re getting is the following: (^|\s)((f|ht)tps?://([^ \t\r\n]*[^ \t\r\n\)*_,\.]))

It will basically everything that is not a white character after http://, but also tries to detect things like URLs between parenthesis and other punctuation. So if you write something like Check out, it's awesome!, it will not include the , in the matched URL.

By the way, in order to do that I used Debuggex, which is getting really handy to work with regular expressions!

That’s it for today :)


IE6 invented the Web 2.0

I just wanted to share a small revelation I had, and it all starts with the XMLHttpRequest. First of all, the naming of it is quite weird: it’s XML but Http. Who might be so inconsistent to name an object this way? It should rather be XMLHTTPRequest or XmlHttpRequest… I can sense bullshit here, might Microsoft be behind that?

Well, it turns out that Microsoft did indeed introduce this object, and it was named XMLHTTP. And actually the wrong name comes from Mozilla, but Microsoft still invented it. Damn. (Wikipedia for further information)

But wait a second and think about the implications of this. It’s true that there is many things in browsers, more and more appearing every day. There is a whole galaxy of HTML5, CSS3 and JS APIs out there just waiting to be discovered by developers to do awesome things. But the cornerstone of this all, the one behind the whole AJAX hype that went on 10 years ago, is purely and simply our XMLHttpRequest friend!

And there is more. Indeed, did you notice the hell that it is to do a HTML5 website for IE6? All those polyfills and compatibility libraries in order to emulate HTML5 features, that’s so much a pain in the ass right? But you know that IE6 was released in 2001, do you think you could even come close to this with Phoenix 0.1 that was released in 2002? That’s right, Microsoft managed to make a browser so extensible that it lasted 10 years.

Then in case you were wondering, yes IE6 did pretty much invent the Web 2.0.