Hacker News new | ask | show | jobs
by beckman466 1678 days ago
> I always look at these intros/descriptions of Regex with a heavy heart. They describe what regex's are, but none of the info is going to make much sense to someone who doesn't already know why they would want to learn them.

> Catch 22 of trying to explain what they are.

any teachers, or people who explain/document things for a living, have some good tips or templates to avoid this?

4 comments

I don't see what's so hard.

"With regex you can search for any combination of characters in a string or return any such combo or modification you like"

Yeah.

1) Encourage whoever you're teaching to stop you immediately if they don't feel like they understand something you're saying, even if it's a single word that's throwing them off, and especially if they're not rock-solid about a simple concept they "should already know". Modern school teaches people that "returning to the basics" is a waste of time; but as Feynman says, you should return to the basics often, as masters do. Pianists don't stop playing scales once they're famous. This means that if your student want to review what an "expression" is, or a what a "string" is, or what "returning" means, you've got to encourage them to do it. If a 10-minute explanation of RegEx turns into a 45-minute review of how the string variable type was invented, that will be more useful for the student in their pursuit of RegEx mastery than will a technically accurate but shallow regurgitation of your 10-minute spiel about what RegEx is. This is because they need to lay the mental framework of how they're going to think about RegEx; you are able to explain it in 10 minutes because you already have that built in your head, but they need to build those background pathways and connections themselves before analogies and summarizations make sense.

2) Try to figure out how you can make them experience the problem that led to the invention of RegEx. A student will never truly understand why a solution is valuable until they really, deeply understand the problem that the solution is solving. Note that I'm not saying that you need to teach the problem before the solution--not every student needs them in that order--just that they won't master the solution until they understand the problem.

3) In lieu of "testing" a student, have them take many breaks to re-explain what they've learned to you, even if you haven't reached a real conclusion about anything and are just checking that they understand a sentence you said. Many students, especially if they have a good teacher, will experience the sensation of comprehension even if it's not actually there. This is the "it makes sense when he says it, but when I try to explain it I can't find the words" phenomena. Taking frequent breaks to have them explain things back to you in their own words will reveal their conceptual weaknesses, and those are what you focus on.

4) Don't try to get it all done in a single session. Learning requires both forgetting and sleep. First, you should tell them to expect to forget, and that they will need to come back over and over again to topics that seem basic or simple; forgetting is part of the process of learning, like painting multiple layers on a wall. Second, they need to sleep in between sessions, which means that you can't teach everything in one day and you can't learn everything in one day, and multiple days may need to be spent reviewing the same material.

This all makes a lot more sense when you treat learning like sports. Learning <programming topic> is like learning a slice serve in tennis. You don't need to serve slice, especially if you can hit flat serves at 115 mph, but serving slice is an invaluable technique when you're playing someone who can't return slice serves at all--that's a near-guaranteed 3/6 games out of every set. But in order to learn it, you need to focus on your tennis fundamentals (stay loose, eye on the ball, toss correctly), practice the same basic movements over and over again, get lots of sleep, and understand why you're learning the skill in the first place.

Good answer. I think 2) is the one that jumped out at me because it reflected my own experience and understanding - Regex became easier to understand when I also felt like I understood its motivations. Starting there, with motivation and context, is my typical go-to move.
Very valuable insight, thank you a lot!
Doesn't make sense to me.

> Regular expressions (commonly known as "regex") are used for advanced pattern matching in strings. They can also be used to replace text, transform strings, or extract substrings. It's a very powerful domain-specific language that is purpose-built for string patterns and manipulation. Many general-purpose programming languages include regex engines that use similar, but often slightly different syntaxes to support the use of regex.

>teachers, or people who explain/document things for a living

I'm neither of those, but I frequently explain things to my friends and they say I explain well. So I will throw my two cents anyway and hope you don't find them trivial self-help platitudes.

(1) Start with Concrete things

No learning ever starts from generalities. Never start with something like "Regular Expressions is a declarative language to describe strings of a certain general form blah blah blah", I call this the wikipedia style of teaching, an utterly useless word-swapping game where you explain things and constructs in terms of even more complicated (or equivalently complicated) things and constructs till the learner runs out of stack space and comes out learning nothing and feeling like a faliure on top of that. Remember that learning is a process of building up, you start from familiar questions, problems, specifics, themes or worldviews of the learner, then gradually introduce generalizations and solutions to get them to where you want them to be.

(This is generally a two-way street, the learner also has to know something about the teacher and where they are coming from and what are they trying to do, it's like telling a story: The author can't simply say "because I say so!" to explain every detail of the plot, but the reader can't also say "I don't know, feels too unbelievable" in response to every plot detail.)

The bare essense of regex is using meta characters to encode several string characters. The fact that the regex

"meta.*"

so powerfully and succinctly encode string-recognizing logic that would be imperatively expressed as

fun metastar(str):

if len(str) < 4 then return false

if str[0:3] != "meta" then return false

return true

Makes the case concretely and perfectly: a single string (two letters longer than the simplest string it matches) versus 3 bug-hiding branches (e.g. what if the "!=" operator in the implementation language actually compares string-identity, not string-equality?). This is even more generous than most languages allow, the ':' array slicing operator for example is saving us a loop. (possibly inefficiently, if it's copying the slice from the string. Not a problem now for "meta", but who knows when it will be?)

Regexes are patterns, which are things that resemble the things they are describing, but aren't any of those thing specifically. It's like a dark silhouette of a man, it doesn't describe any specific man, it's a pattern that can match any man of the same general body plan and height. Regexes are silhouettes, the dark parts are the meta characters that act as placeholders for arbitary strings.

(2) Examples from real life

Don't just take the "menu approach" of reading all the features and meta characters and thinking you're explaining, actually take the time with examples. Again, examples are all that matters for the human brain, it's literally useless to tell somebody to imagine a golden mountain if they have never seen a mountain or gold before.

Our world is awash in strings of certain identifiable structures (money, dates, times, names in formal settings, equations, etc...), try to take the time to obtain several real-life examples, try to make the data come from sources like wikipedia or other publicly available dataset. After demonstrating how each of those 3 or 4 general forms of strings can be described powerfully by this meta character, give 3 or 4 more general forms to the learner to try on their own.

(3) Visualize executions, introducing debugging tools in the process

Just because regexes are declarative, doesn't mean the matching process can't be described in imperative terms, especially initially.

Later on, introduce tools like https://regex101.com/ or https://www.debuggex.com/ and always draw "Rail road digrams" that show what a given regex matches in terms of easily verbalized diagrams.

(4) Disadvantages, subtleties, and other approaches

The learning process isn't a sales pitch, there are plenty of things that suck in regexes. They are non-standard and ad-hocly designed, the runtime engine that runs them can be inefficient (unlikely if the host programming laguage is popular and > 20-years-old, but a thing to keep in mind nontheless: regexes are a whole other language, requiring a seperate interpreter or a compiler other than the one for the surrounding code), and the equivalent imperative code might not be so bad in comparison for simple cases and much more debuggable.

The name "regex" is derived from a misnomer, the orignal "regular expressions" are a mathmetaical formalism to encode finite-state machines, it orignally contained only alternation, sequencing and kleene star (the '|' and the '*' operators, plus putting letters next to each other. That's it, that was the orignal regex capabilities), when programming languages and cmd utilities started to implement them in the 70s and 80s, each started to experiment with features that break this model. For example, "capture groups", the ability of the regex to copy parts of the matched string into variables, trivially break the model : if you can capture arbitarily-long strings, then you can't be a finite state machine.

This increases power but decreases efficiency guarantees (Perl's regex are dangerously close to turing-completenss [https://www.perlmonks.org/?node_id=809842]!, the language is hiding a whole other language inside a single feature) , it also complicate the notation with symbols for the new capabilities that it wasn't designed for, with the result being the mess that regexes' syntax is now. It also means you can never "learn" regex, you can only learn (to whatever accuracy you care) Perl's regex, or Java's regex, or Python's regex. There is a vague set of commonalities, but don't rely on remembering which is a common and which is different when there are so many features implemented in so many ways.

Don't let the learner come away thinking that "declarative" is synonymous with regexes. For example there is the parser combinator style, which can encode the above example as something like:

the_specific_string("meta"). followed_by(ANY_LETTER). repeated(ZERO_OR_MORE_TIMES). build_pattern(). recognize("meta-circular")

the key idea at play here is a sort of "builder pattern". There is an abstract "parser" object that has a single recognize(str) method, and you can build your pattern by composing together the many customizable childrens that implement this abstract interface. The composition happens by "combinator methods", which takes two or more parsers and build a parser that performs a mixture of their functionalities indicated by the name (e.g. followed_by() takes several parsers and sequences them next to each other, repeated() takes a list of parsers and iterates the last one any number of times, including skipping it entirely). The things being built to represent parsers are generally (in functional languages at least) closures, but there is no reason why this pattern can't be built on top of regexes, each step simply generates the equivalent meta-character, and build_pattern returns the final pattern string.

There are tons of those "Parser approaches", formalisms, tools, patterns and libraries to express strings and string-recognition and parsing declaritevly. Regexes are merely the most famous and widespread, which is a sad state of affairs IMO.