Lacuna Part 1: Mind the Gap

The old-school wisdom about learning a language focuses a lot on rules. Language textbooks start each chapter with a list of vocabulary and then proceed into sections, each describing a different grammatical pattern to puzzle over. Conjugations and plurals are laid out in tables telling you how the right word should be chosen at the right time, with their exceptions on the margins in little warning boxes. When you reach the end of the chapter, a common expectation is that you can now solve the language like it’s a complicated math problem. Proof of this may be requested with a long list of exercises, waiting for you to test how well you follow instructions.

The issue with this is it’s hard to learn a language using only the rules. We don’t have the reasoning and memory speed to solve sentences like math exercises - our meat-brains trudge through manually assembling speech with confusion and inelegance. The default way everybody learns language is through intuition. You don’t sit down with your baby and give them grammar lessons. You just hang out with them and trust that the millions upon millions of sentences they soak up will do the heavy lifting for you.

In 2019 I started trying to write software to emulate this abundance-based learning. Over the decade prior I’d tried lots of apps advertising immersion” but found most were layering minigames over 15-30 example sentences per chapter, the same as textbooks - enough variation to learn the rules, but not enough to get the gist. The alternative I wanted to try was generating sentences, swapping out words for nearby equivalents that wouldn’t mess up the grammar. No textbook or app author was going to sit down and write out a thousand examples per chapter, but with the right syntax, software might be able to match that immersion level of sentence volume - provided, of course, that the sentences made sense.

Generating sentences has both pros and cons. Language is complicated, which means the generator is complicated: outputted sentences need to be both grammatically valid and not conceptually nonsensical, requiring an internal syntax that can work with concepts like conjugation, plurals, and word associations. A human still has to write study material for that generator, and it’s hard work. Textbooks include exercises with the expectation that the student will grind through the rules and create more examples than what the chapter originally provided - if instead you want to create those examples upfront to immerse the student, all that complex sentence construction now falls on the author.

Luckily, the benefits quickly become clear. Templates scale exponentially, generating huge numbers of sentence variants with each replaceable field added. A lesson generator can track vocabulary and grammar independently of specific sentences, and each new word learned will propagate to previous sentence templates to increase variety even more. An introductory language deck with 100 templates can have over a million sentence variations. The resulting software has no need for minigames - it’s a flashcard app that shows you brand new sentences every day, keeps track of what you know, and trusts that volume will enable your language intuition just like it did for your first language.

In the years I’ve worked on this project, I’ve found it interesting how machine translation software has also transitioned from learning with rules to learning with volume. The old AI programs of the 90s tackled language with tables and decision trees, trying to encapsulate the variety of language into simple yes-no logic. The results were comically terrible - words were mixed up, context was ignored, and nuance was forgotten, or worse, inverted. It’s only recently that machines have become gifted at language with a brute force statistical approach. Large Language Models bypass linguistic rules and focus only on volume, training on billions upon billions of examples until they obtain a statistical intuition for how language should sound.

Our brains aren’t direct counterparts to any kind of software, but it should be obvious that they function closer to the latter approach. Textbook authors do their best to make a rules approach work, and with the right teacher, a classroom environment may be excellent and effective - but not everyone has this opportunity. Immersion is how everyone learns their first language, and learning new ones shouldn’t just be restricted to those with an excellent tutor or those lucky enough to study for years in a distant country. Learning through abundance isn’t magic, you just need the abundance. It doesn’t matter where you get it.

Lacuna is a collection of my past few years of work on generators and quiz programs into a web-app. It’s currently in a private pre-alpha. In the next few posts, I’ll try to talk about how it works in a little more detail.

Previous:The Forever Phone