Log edited with Logedit 2.7.0pl on Mon Apr 9 10:20:48 CDT 2007 Using configuration file /home/dunemush/.logeditrc Editing out: @admin @chat DOING/WHO arrive/left @mail pages ANSI O-spam Timestamps Regexp stripping: Mailbox cleared, Message * marked for deletion, Semaphore, Halted, Queue Logged by Molikai Word-wrapping at 72, 0, 2 ----------------------------------------------------------------------- M*U*S*H* - Sunday, April 08, 2007, 7:35 PM ------------------------------------------ Cheetah shuffles a stack of papers, eyes them, looks annoyed and re-orders them properly. Sketch says, "Announce it I say! :)" Trinsec says, "Yeah announce your lecture again including room." Tokeli idles in here, since he hasta go leave to find easter eggs. Sketch says, "As though there were exciting things happening anywhere else on the MUSH.. <_<" Molikai will be quiet for oh, the next 20-30 minutes as he cooks dinner. Sketch will TRY to be quiet for the next 20-30 minutes.. O:) Trinsec grins. Cheetah makes sure he doesn't reshuffle his papers again, coughs, and starts.. Cheetah says, "Hey folks. Thanks for attending. Today I'll attempt to shed some light on Regular Expressions, AKA regexps or regexes. I'll start with some generic info, then I'll talk about basic syntax, more details on how to use them with PennMUSH specifically, some tips, and from there on I'll branch out with more advanced usage, examples and tricks. If, at any point you have a question, please 'use blackboard' to let me know. Finally, the 60 minutes set for this lecture was a default. I'm not sure how long we'll take all in all, though the basics should be covered within that time. I'll be taking requests and questions in the final part, so I'll go on as long as there's interest and material to cover." Cheetah says, "Aye, Trin?" Trinsec hehs and couldn't resist using blackboard. :P Thought it'd list commands. ;) Cheetah says, "No, it actually lets me know you have a question ;)" Trinsec nods and knows now. Thanks. Tokeli <.< Tokeli wanted to see what it did. Quiets now. Kimiko mew Tokeli SHHHHHHHHHHHHHH. Cheetah says, "Anyone else wanting to test the 'use blackboard' command before I go on? ;>" Cheetah @baps Tokeli. "Anyone /else/ I said :P" Sketch grin. Sholevi says, "Sorry. :P" Cheetah continues: "As most of you know, at least, Regular Expressions (I will refer to them as 'regexps' during the rest of the lecture) are a way of matching things. They're much harder to understand than globbing (where * means zero or more characters and ? means exactly one), but they're also much more powerful. There are several 'dialects' of regexp, which all vary a bit in how they work, and what extra bits you get." Sholevi will stay after class to clean the erasers. Cheetah says, "Penn uses the 'PCRE' regexp engine, so that's the dialect we'll be focusing on. PCRE stands for 'Perl Compatible Regular Expressions', so obviously what you'll learn today also works in Perl. For those interested, they also work in tf5, mIRC, various programming languages, and in general most(?) modern applications with regexp support." Cheetah says, "(Note that especially older things like egrep, sed, awk or emacs use similar, but different, and less rich syntax. They'll be easy to pick up if you know PCRE, though that's beyond the scope of this lecture.)" Cheetah says, "One may wonder, though.. Is this whole regexp thing worth learning? Well, I'd say this: There will be very few things, if any, that can't be done without regexps. However, when used in the right places a few regexps can make your coding tasks a lot simpler (and for you code optimizing freaks, save on quite a few function invocations ;>). Also, regexps are a 'real life' skill. If you're a programmer, or even if you want to make other computer related tasks easier, you may find that knowing regexps can be pretty useful." Cheetah takes a sip of water, shuffles the front piece of paper to the back, and peers about. "Okay, probably the first part of the lecture you'll all forget. Any questions?" Trinsec shakes her head slightly. Cheetah continues, "First off, a note: I learned a lot from the PCRE manual. On many Unix systems you can access it by typing 'man pcrepattern'. If you don't have access to such a system, there's a copy at http://www.pcre.org/pcre.txt about halfway through, look for 'PCREPATTERN(3)'. I can recommend reading though it a bit every so often if you have a bit of spare time." Cheetah says, "Now let's get on with the actual syntax. Regexps are used to match against stuff, and as you might expect, most characters match themselves. For example, the regexp 'a' will match the substring 'a'. The regexp 'meow' will match the substring 'meow', etc." Cheetah says, "Note that I just said 'substring' not 'string'. Why is that? Well, if you want to match any line with a certain word in globbing you use '*foo*', but in regexps you don't. The regexp 'foo' will match the 'foo' in the string 'yummy food'. A technical way of saying this is 'regexps are not anchored by default'." Cheetah says, "But what if you /want/ to match just 'foo', and not 'yummy food'? For those cases regexps have 'anchors': Something that matches only the start or end of a line. For example, '^foo$' will only match 'foo', not 'food', not 'yummy foo', and certainly not 'yummy food'. The ^ character indicates the start of the string /must/ be here, and the $ means the same but for the end of the string." Cheetah says, "Of course, this introduces a small problem.. What if you want to match something like '$5 in change'? Well, for those cases regexp has the \ character, AKA the 'escape' character. For all characters with a special meaning the escape character will strip away that meaning if you put it in front of it. So the regexp to match our example becomes '\$5 in change'." Cheetah says, "Conversely, the \ character can make some otherwise dull characters do something special. For example, \d means any digit, while \D means any non-digit. \s means whitespace, \S means non-whitespace. \w means any 'word' character and \W means any non-'word' character. As you can tell, often the capitalised version matches everything the non-capitalised version doesn't. As an example, the regexp '\d\D\d' will match '3d6' or '2x3' or '1 7', etc, but not '123' or 'id3'." Cheetah says, "There are other uses for \ too. A few of those we'll go into in a bit, and some others I'll leave for you to discover by yourselves, though a lot of those aren't as useful in PennMUSH." Cheetah takes another sip of water. "No questions yet? Cool. Either everyone's still with me, or most of you are snoring. I don't think I've gone on long enough to induce snoring yet? ;)" Nämmyung laughs Trinsec is listening intently. ;) Tokeli zzz... zzz... zzz... Cheetah gets back to his papers, "Now.. We can now match word characters and digits and spaces, but what if we want to match a specific set of characters we pick? Regexp can do that, too. For that it uses 'character classes'. Character classes start with [ and end with ]. The characters between those define what characters to match. So, if you wanted to match a single hexadecimal digit, you could use [0123456789abcdef]. (Note that I said a single digit. I'll tell you how to match more later.)" Kimiko mews Cheetah says, "Of course, we just learned that a normal digit is \d, so we can rewrite that to '[\dabcdef]'. But what about a through f? Surely that can be done in an easier way? And it can! You can specify a range with the - character. So we can rewrite our example yet again as [\da-f]. (If the - comes at the start or the end, it's taken as a literal -, so [+-] will match either + or -.)" Cheetah says, "But what if you want any character /other/ than a few you specify? Regexp can do that too. If a character class starts with a ^, that means 'any character, but not these'. (Us regexp nuts could also say that ^ negates the class.) So for example, [^\d] means the same as \D, and [^aeiouy] means any character other than a vowel." Cheetah says, "Finally, sometimes you don't much care what a character is, you just want /any/ character. Regexp uses '.' for that. The . in regexp works like '?' does in globbing. So the regexp 's..p' could match, for example, 'soup' or 'soap' or 's04p'. The only thing . doesn't match (by default) is a newline (%r in MUSHcode). Of course, . will match a literal '.' too, but if you want to match /just/ a period, and not another character, you can just use '\.'" Cheetah eeps, lost one. Cheetah says, "Oh well.. That's about all about characters, so I'll briefly touch on another subject, namely groups, and give you one use for them. Of course, groups are a big thing in regexp, so you'll learn more uses for them in a bit." Cheetah says, "Basically, groups are a bunch of stuff between ()s. Groups can also be called 'subpatterns', and they match whatever they would without the ()s. For example, the regexp '(foo)(bar)' still matches the string 'foobar', with as only difference that the 'foo' bit and the 'bar' bit are grouped. Groups can also be nested, so '((foo)bar)' still has two groups, the group making up 'foo', and the group making up the 'foo' group followed by 'bar'. Don't worry if the distinction doesn't make any sense yet." Cheetah says, "I promised a good use for groups, so here's the first: The | character introduces an alternate branch (which is a pretty way of saying | works as an 'or' sort of thing). For example, the regexp 'foo|bar' matches either 'foo' or 'bar'. But if you have a whole sentence you want to match, with only one word that changes, then what? Well, a group lets you use | on a limited scale. So, for example the regexp '(MUSH|MUX)code' matches either 'MUSHcode' or 'MUXcode'." Cheetah pauses again. "Whew. If you folks are still all on track, I'm doing a lot better than I'd figured." Nämmyung nods Trinsec nods too. Cheetah forges on, "Now that we have groups covered, we can introduce the last big concepts in regexp basics: atoms and quantifiers. Those look pretty scary as far as words go, but no worries, you've sort of already seen atoms, and quantifiers aren't so bad." Cheetah says, "Atoms, like in physics, are parts that regexp considers one thing that it can't break up any further. Anything that represents a single character is, of course, an atom. So for example 'a', '5', '\D', '[abc]', '[^+-]' and '.' are all atoms." Cheetah says, "Go ahead, Nammy?" Nämmyung says, "How about things like newline and return and stuff?" Nämmyung says, "Are they considered a character or a special (escaped) string?" Cheetah says, "In most cases a newline (%r) is just another character for most purposes. Of course, you can't match across different 'sets of input'." Cheetah says, "IE: If you have something that listens, and I @emit a multiline poem, it'll see it as one thing." Nämmyung says, "okay" Cheetah says, "If I use several says, it only sees those." Cheetah says, "Err, individually that is." Nämmyung says, "right" Cheetah says, "All clear?" Nämmyung nods Cheetah sips his water and continues then, "Groups are also atoms, so the whole of '(foo)' or '([qwerty])' or '(meow|woof)' are all atoms. Note that /inside/ the ()s a group can have several atoms, outside the ()s the group as a whole is considered one." Cheetah says, "(There are a few other things that can be atoms, but we haven't seen them yet, and they all make sense.)" Cheetah says, "So why is it important that something is an atom? You may have guessed it, but it has to do with quantifiers. A quantifier specifies how often you want the atom preceding it." Cheetah says, "For example, the * quantifier means 'zero or more of what came before me'. So the regexp 'baa*' can match 'ba' or 'baa' or 'baaaaaaaaaa', etc. The + quantifier is almost the same, meaning 'one or more of what came before me'. So 'baa+' will match 'baa' but not 'ba'." Cheetah says, "Finally, the ? quantifier means 'one or zero of what came before me'. So, the regexp 'this lecture is (not )?boring' matches either 'this lecture is boring' or 'this lecture is not boring'. I hope it isn't, by the way ;)" Trinsec giggles. Cheetah says, "Of course, sometimes you want other quantities of characters. For those cases there's the {x,y} notation. x is the minimum amount of characters you want, and y is the maximum. You can leave y empty if you don't want a maximum. So, 'a{2,4}' will match 'aa', 'aaa' or 'aaaa'. 'a{4,}' will match 4 or more 'a's in a row." Cheetah sips his glass, but merely gulps air. Well, at least no arguing about whether it's half (empty|full). Nämmyung throws tomato? Cheetah dodges. "Okay, okay.. I'll get on with some examples then." Cheetah says, "You can match any decimal number with '^[+-]?\d+(\.\d*)?$', IE: Start of line, then optionally a + or -, one or more digits, then optionally a dot and zero or more digits, then end of line." Cheetah says, "Now is the time to raise your hand if you're missing something." Cheetah listens to the crickets, and continues with his other example: "You can match a hexadecimal number (not counting fractions) with '^(0x)?[\da-f]+$', IE: start of line, then optionally 0x (us programmer types tend to use that to indicate something is in hex sometimes), followed by one or more characters that are either numbers, or in the range a through f, and then end of line." Cheetah nods at the latecomer, and concludes this part of the lecture. "If you understood these basics you already know all the things you /need/ to know in order to write any form of useful regexps. Now all you need is to know how to use them in MUSHcode specifically (which I'll get to next). There's more to know, but I'll cover some in the 'advanced' bit, and what's left you can pick up from the manual, or from 'help regexp syntax', although the latter doesn't cover everything either." Cheetah fetches a new glass of water, sips, and turns to his next piece of paper. Cheetah says, "So, armed with all this knowledge, there's one more thing standing in your way to use them in Penn: How the heck /do/ you use them in Penn? Well, mostly, there are two ways:" Cheetah says, "Functions. Penn has a bunch of functions that use regexps. A list of them can be found in 'help regular expression functions'. My personal favorite is regeditall() (or regeditalli(), depending). Note that the MUSH parser likes to eat characters that are part of your regexp. So characters like []{}\, all need an extra \. Of course, that can get very ugly, so if you have a regexp of any decent size, do yourself a favour, put the regexp in an attr of its own and v() it." Cheetah says, "(As an aside, you don't need to escape ()s in a regexp. Penn keeps track of those for you.)" Kimiko says, "(Yay)" Cheetah says, "Commands and listens. These are a little less of a hassle, since all the special characters are left alone. However, be careful with the : character, because Penn treats it as 'end of command', so if that occurs anywhere in your regexp, it needs to be escaped, too." Kimiko says, ":, or ;?" Sketch says, "Colon." Kimiko thought it was semicolon :o Cheetah says, "As in $mycommand:@pemit %#=Yep, it works." Sketch types --> &this me=$mycommand:think Blah;think foo Cheetah says, "Not the ; that separates the two think bits in Sketch's example." Rusty says, "Semicolon separates commands in the command list. Colon tells Penn where the end of the match pattern is and where the command list starts." Kimiko says, "Oh! Right." Kimiko says, "I was getting it mixed up in commandlists." Cheetah says, "No worries. All clear?" Kimiko nods Trinsec nods. Cheetah says, "Of course, commands and listens don't automatically assume regexp-ness, so you have to @set object/attr=regexp to make it work. (Don't forget to unset NO_COMMAND on a new object. I wish I had a quarter for every time someone remembered 'aha! I need to set the regexp attribute flag', then spent several minutes debugging their regexp because their object was still NO_COMMAND. I've done that myself more often than I'd like.)" Rusty grins Cheetah says, "Another big thing that goes wrong with regexps in commands is that us MUSH folks love +commands, so you get a regexp of something like '+roll \d+d\d+'. There are several problems there. Fist off, it's not anchored, so you could put anything before +roll or after the last digit and it'd still match. Second, the + is a special character, one that repeats whatever came before it. In this case there's nothing to repeat, so the regexp engine silently rejects your command, and it'll never match anything. So you'll want to use something like '^\+roll \d+d\d+$'." Cheetah says, "Now, for normal (not regexp) commands, the first * or ? goes into %0, the next in %1, etc.. For regexps, of course, that doesn't work very well. Instead, what the first /group/ matched goes into %1, what the second matched in %2, etc. Groups are counted by where their opening ( is, so if you nest groups the outside ones will have lower numbers than the inside ones." Cheetah says, "'But what about %0?', you might ask? Well, %0 will contain everything /matched by the regexp/. Note that this does not need to mean everything typed. For example, if your regexp is 'foo' and the player typed 'yummy food', %0 will contain just 'foo'. If your regexp is anchored with ^ and $, %0 /will/ be everything the player typed." Cheetah says, "Aye, Nammy?" Nämmyung says, "Is it possible to match glob * in a regexp command?" Nämmyung says, "sort of mix and match?" Cheetah says, "Well, you can't 'just' plop * in a regexp command, because * already means 'zero or more of what came before me'. However.." Rusty says, "You could use '(.*)', right?" Cheetah says, "We learned that . means 'any one character (other than newline)'. So in regexp '.*' does what * does in glob." Nämmyung says, "Ah, and that better be the last thing you match for, right?" Cheetah says, "And aye, if you want the result saved you put it in a group, so (.*) in full." Cheetah says, "Not per se, nope." Nämmyung okays and will mess with it later. Cheetah says, "^\+whatever (.*)=(.*)$" Nämmyung says, "ah, I see" Cheetah says, "Works mostly exactly like you might expect. With one gotcha." Trinsec recalls someething about greedy? Cheetah says, "Yep. Since I need to cover that anyway, now is the ideal moment." Cheetah says, "By default, quantifiers are 'greedy'. IE: They like to match as much as possible." Cheetah says, "So if the above regexp matched the string '+whatever cookie=yummy=yes', the first .* would be as greedy as it could." Nämmyung nods Cheetah says, "So %1 would be 'cookie=yummy' and %2 would be 'yes'." Cheetah says, "To make them not be greedy, you put a ? after them." Cheetah says, "So: ^\+whatever (.*?)=(.*)$" Cheetah says, "Now the .*? will match as /little/ as it can." Cheetah says, "So with the same example %1 would be 'cookie' and %2 would be 'yummy=yes'." Cheetah says, "Am I making sense to everyone, still? ;)" Nämmyung nods Cheetah continues, then. "Of course, sometimes you can't care what's in a group, because really you're only interested in some other use of groups. In that case you can use a pattern like '(?:...)'. A 'non-capturing subpattern' in the lingo. It works just like a regular one, it just doesn't get a number. So '(?:foo|bar)' will match either 'foo' or 'bar' as usual. Don't forget to escape the : in a command or listen." Molikai is stil ltrying t ocatch up /and/ eat his dinner. ;) Rusty says, "Can you nest parentheses to capture which one was matched?" Cheetah says, "How do you mean, Rusty?" Rusty says, "In other words, if I want a command '+choose foo|bar', can I match '^\+choose ((?\:foo|bar))$'?" Rusty says, "And have $$1 be foo or bar?" Rusty says, "Err $1" Nämmyung says, "You wouldn't need to nest, though, would you?" Cheetah says, "Sure. Though in this case you could, of course, leave the non-capturing group out." Cheetah says, "Yes, Sketch?" Sketch says, "I just wanted to point out... I often do stuff like this: +command(?\: (thing)) Do stuff:" Sketch says, "Which is... *nods at Rusty* That? X)" Cheetah thinks you perhaps dropped a ? there? Sketch says, "Oh. I did, yeah. :)" Trinsec thinks so too, at the end. Cheetah says, "But yeah, that's a pretty good example of nesting capturing and non-capturing groups." Cheetah says, "(As long as you remember the ? to make the outer one optionall.)" Cheetah shuffles a piece of paper to the back of the stack, and ends up with the first one back on top. "Goodie. I ran out of all I've prepared in advance, which means I covered all the essentials. For the next bit I'll take requests and suggestions from the audience, and go into a few of the topics I have left." Cheetah says, "So everyone with questions not directly related to what I've said so far (but still relevant to regexps), now's your chance. And, yes Molikai?" Molikai says, "Now, I've still to properly digest all this - I can tell Im' going to be having fun with this log and EXPERIMENTING! But! How much of this, and to what degree, is relevant for TinyMUX? :)" Cheetah says, "Good question. I believe everything I said about regexps per se so far is still 100% valid. MUX does differ a bit in the MU*-side specifics." Cheetah says, "For example, last I checked MUX didn't have regrep, so I had to use something involving regmatch() and map() or whatever." Molikai thought that might be the case. Molikai will explore! Cheetah says, "But even so, most of what I've said should apply." Cheetah says, "Most of the differences are in things I haven't explicitly mentioned as far as I know." Molikai says, "Danke." Cheetah says, "I hope that answers your question, because I'm out of answer on that specific matter ;)" Cheetah says, "Yeah, Trin?" Trinsec says, "Could you go into detail about more advanced features of regexp? That {1} thing was new to me, for example." Trinsec says, "(Which was talked about before the lecture)" Cheetah says, "I'll want to go into some new topics in a bit, yes. Though note that the {1} thing is actually almost 'less' advanced. Since ? is really shorthand for {0,1} and * for {0,} and so on ;)" Boris apologies for being late....again... Molikai says, "Darn Grues." Trinsec says, "Oops, I meant (1) etc." Sketch says, "Here on M*U*S*H, Molikai, we respect all races. :D" Cheetah waves, "Well, no worries, I think the biggest inconvenience is on your end ;)" Trinsec remembered the wrong kind of parenthesis. Molikai has a log he can toss over when this is done. :) Rusty dims the lights as much as he can while still making it possible to not trip over the projector. Trinsec says, "So, I meant (1), not {1}." Cheetah ahhs, "Well, the (1) bit is just part of the syntax. The full thing is (?(...)...|...) where ... depends on what you're doing ;) But yes, I'll go into that in a bit. 'Conditional subpatterns', they're called, by the way." Trinsec cools. Cheetah will want to cover two, maybe three things before going into them. Trinsec nods. Cheetah says, "Which, unless anyone else wants to ask a question at this point, I will do now." Cheetah says, "Okay, first up: Assertions. Scary word." Cheetah says, "Assertions basically mean 'make sure so and so is true'. They don't actually eat any characters, and sometimes they don't even deal with characters so much as the bits 'in between' characters." Cheetah says, "Actually, you've already seen two assertions, but I didn't call them that at the time." Cheetah says, "^ 'asserts' the start of a line, and $ 'asserts' the end. Neither matches a character directly, and neither 'eats' one. They just 'make sure' there's a start or end going on." Cheetah says, "New ones: \b only matches the bit between a word and a nonword character. \B only matches the bit between two word or between two nonword characters." Trinsec blinks. Cheetah says, "So, for example, '\bfoo\b' matches 'foo' but only as a whole word." Cheetah says, "IE: It'll match 'baz foo bar', but not 'bad food bar'." Molikai says, "And BfooB would only look for foo as part of a word? such as asfoobar?" Cheetah nods to Molikai. Trinsec says, "What about . and , and such? That be \b?" Cheetah says, ". and , are not 'word' characters, so yes, 'foo, bar' would match '\bfoo\b'" Trinsec says, "And the digits?" Cheetah says, "Are word characters, I believe." Trinsec hmms and nods. Cheetah says, "I think the rule was all letters, digits and '-'." Cheetah says, "Much esier for me, the exact rule is \w ;)" Trinsec wonders if you couldn't use \Wfoo\W? Cheetah says, "Yes, except that'd require characters there, and would 'eat' them." Trinsec ahhhhhs. Cheetah says, "So if I had just 'foo' it wouldn't match, because there's no characters around it." Trinsec nods and is enlightened. Cheetah says, "The more tricky ones are the ones you write yourself." Cheetah says, "An assertion can be positive or negative, and lookahead or lookbehind." Cheetah says, "That's mostly self-explanatory." Cheetah says, "For example, 'w+(?=;)', a 'positive lookahead' matches a word followed by a semicolon, but doesn't 'eat' the semicolon." Cheetah says, "Err, imagine I said \w+ there ;)" Sholevi writes Givur up for tardiness. Cheetah says, "A negative lookahead would look like 'foo(?!bar)'. IE: Match any 'foo' as long as there's no 'bar' behind it." Cheetah says, "So 'foo', 'foobaz', 'food storage' all work, but 'foobar bazbat' will not." Trinsec looking a bit confused by the assertions and hopes for workful examples with those. Molikai says, "Define 'behind'?" Cheetah says, "Looking backwards." Cheetah says, "Before the assertion itself." Cheetah will try an example. Cheetah says, "I have a regedit that ansifies a bit of text according to @@ and a letter." Cheetah says, "Like '@@Rred @@Nnot red'" Cheetah says, "I need to match the whole bit after and including the first @@, up until the second @@." Cheetah says, "But if I use '@@(.)(.*)@@' it'll 'eat' the second @@, and matching restarts at the N." Cheetah says, "So at the end of my regexp I have something like.." Cheetah says, "(?=@@|$)" Trinsec strains her brain, trying to make sense. Cheetah says, "IE: go on until you see @@ or end of line as the next bit." Cheetah types --> say regeditall(foobar foobaz foobat foobar foobaz foobat,(foo)bar,ansi(h,$1)) Cheetah says, "foo foobaz foobat foo foobaz foobat" Cheetah says, "I wanted to ansify the 'foo' but only in 'foobar'." Cheetah says, "But I'm matching the 'bar' bit." foo foobaz foobat foo foobaz foobat Cheetah says, "So regeditall eats it." Cheetah types --> say regeditall(foobar foobaz foobat foobar foobaz foobat,(foo)(?=bar),ansi(h,$1)) Cheetah says, "foobar foobaz foobat foobar foobaz foobat" Cheetah says, "Now I'm not /matching/ bar, I'm just making sure it's there." Trinsec hmms, and usually solved it by putting 'bar' after the ansi. :D Cheetah says, "Right ;) You see what I mean now though?" Trinsec nods. Trinsec says, "That's the positive, what about that negative?" foobar foobaz foobat foobar foobaz foobat Cheetah types --> say regeditall(foobar foobaz foobat foobar foobaz foobat,(foo)(?!bar),ansi(h,$1)) Cheetah says, "foobar foobaz foobat foobar foobaz foobat" Trinsec ooooohs. Cheetah says, "Ansify 'foo' but only if it's not part of 'foobar'." Cheetah says, "Is it clearer now?" Trinsec nodnods. Cheetah says, "Okay, a lookbehind just looks in the other direction." Cheetah types --> say regeditall(foobar foobaz foobat foobar foobaz foobat,(? say regeditall(foobar boobaz moobar foobar boobar moobar,(? say regeditall(foobar boobar moobar foobar boobar moobar,\\b(\\S*?)(? for function<>s and & for &-substitutions): (?xs) ( \\(.) | &(.) | \[( (?R) | .*? )\] | (?:^|(?<=[<,]))\s*( \w+? )<( (?: (?R) | .*?)* (?: , (?: (?R) | .*? )* )* )> )" Cheetah says, "You may go 'waugh!' now." Trinsec waugh! Molikai says, "WAAAAAUUGH!" Sketch says, "Holy waugh, Batman!" Molikai says, "Blistering barnacles." Molikai says, "What does that /actually/ do?" Cheetah says, "I recommend you don't puzzle it out until /after/ the lecture, if you are at all so inclined ;)" Trinsec isn't inclined! Sketch already puzzled it out, once, only to learn he *didn't need it* X) Molikai is. Cheetah says, "What I just said, parsing something that looks like MUSHcode, except it actually /does/ it." Cheetah says, "While it looks really hairy, it does introduce something useful:" Cheetah says, "See the (?xs) bit at the start?" Trinsec nods? Cheetah says, "Those are 'options' of sorts." Molikai says, "Yeah?" Cheetah says, "The general format is: (?letters)" Cheetah says, "The 'x' is for PCRE_EXTENDED, which means any non-escaped whitespace is ignored." Cheetah says, "So the regexp '(?x)^ f o o $' is exactly the same as '^foo$'" Trinsec says, "Nifty." Cheetah says, "Useful for organisation." Cheetah says, "The 's' stands for PCRE_DOTALL. Which simply means that . will also match newlines in this regexp." Cheetah says, "Then there's 'i' for PCRE_CASELESS, IE: makes the regexp case insensitive." Cheetah says, "And 'm' for PCRE_MULTILINE, which isn't useful very often in Penn, so you can read up on it yourself ;)" Sketch says, "'i' does the same as regedit and so forth, right?" Cheetah says, "Yeah, pretty much." Cheetah says, "Except inside groups it's local." Rusty says, "Useful" Cheetah says, "So you can make some parts case insensitive and some not." Sketch says, "Wow. Didn't know that." Trinsec oohs. Cheetah says, "Any more questions about options?" Rusty says, "Just one: is that lovely mind-bending snippet of code you gave us actually in use on an object we can examine? :)" Trinsec is trying to digest all this, man. Hope you made a summary somewhere that we can copy? :D Molikai is logging this, trinsec, and can send a copy to anyone wh owants it. ;) Cheetah says, "Yes, Rusty. I'll rummage for it after the lecture ;)" Rusty says, "Thankee :)" Cheetah will want one actually, Molikai ;) Sketch says, "Eh?! ;)" Molikai thought you might. Cheetah says, "Also, remember from the start: The PCRE manual covers everything I've told you and more, though slightly more dryly so." Cheetah says, "Anyway.. Now let's get on with Trin's favorite: Conditional subpatterns." Cheetah says, "They work like (?(condition)if-yes|if-no) or (?(condition)|if-yes)" Cheetah says, "A condition can be an assertion, a number, or 'R'." Kimiko murrs and gots to go ... oh well Sketch says, "Uh... wait.... |if-yes?" Cheetah says, "Err, no, sorry. No | after the )" Sketch says, "Okay. Heh." Cheetah says, "So, assertion you know." Trinsec listens. Cheetah says, "A number can be the number of any group, like a backreference. It'll be true if that group matched anything." Cheetah says, "The 'R' is true if we're currently in a recursive subpattern. If you ever find a use for that, and it /works/, please show me your code sometime ;)" Sketch thinks he has one... Sketch says, "((?R)[pemit(%#,STOPIT! STOOOOPIT!)])" Rusty chuckles Cheetah says, "Doesn't quite work like that ;)" Sketch says, "I know. :) But... explain?" Sketch thinks he just thought of an actual use now. I'll get back to you if it doesn't break. :p Trinsec's brain's close to exploding. Cheetah says, "Please do. Meanwhile, any questions on conditional subpatterns that /don't/ involve R? ;)" Cheetah says, "Trin, I advise you to forget about the whole recursive thing for the next few months. It still bugs /my/ brain ;)" Cheetah says, "Yeah, Sketch?" Sketch says, "How could we do a conditional that only has a NO-case, not a yes-case?" Trinsec says, "Can we get to a conditional in the first place.. ;)" Molikai suspects this would be a lot easier with examploes. :) Cheetah says, "Empty yes case, or you make sure you 'invert' your regexp so you avoid the no case only thing." Cheetah fetches the example from earlier tonight. Cheetah says, "(?:(Yes)|No), I (?(1)do|don't)." Cheetah says, "That will match 'Yes, I do.' or 'No, I don't.'" Trinsec says, "Will this put do|don't into %1, and ignore the first group for that?" Cheetah says, "It'll put 'Yes' or '' in %1." Trinsec ohs. Molikai tests, and it does just what he says. Cheetah says, "That's also what I test for." Trinsec says, "What if you want to not have do|don't into a %x?" Trinsec says, "A : before or after (1)?" Cheetah says, "I don't think the conditional itself captures." Trinsec hmm! What if you wanted to? Cheetah says, "(?:(Yes)|No), I ((?(1)do|don't))." Trinsec nods. Cheetah says, "Note that I've found 2 major uses for conditionals:" Cheetah says, "1: Making context-correct regexp jokes." Cheetah says, "2: Impressing people with your regexp skills." Trinsec hehs. ;) Cheetah says, "I'm sure I've used one in actual code at least once." Cheetah says, "But I'm sure I've used them more often in the other two ways ;)" Cheetah says, "Making sense to everyone?" Trinsec still digesting. Trinsec is sure she'll bother you a lot more in the near future about regexps. Molikai nods. "Sort of. But (?:(Yes)|No), I ((?(1)do|don't)) doesn't appear to work... Cheetah says, "How are you testing, Molikai?" Sketch says, "Escape the comma." Molikai assumwes that's meant to spit out something different from the previous version, anyway. Molikai is doing a think regedit( on 'no, I don't.' Cheetah says, "With the , escaped in both the string and the regexp?" Molikai hmms. "Like so? regeditall(No\, I don't,(?:(Yes)|No)\, I ((?(1)do|don't))) Cheetah says, "regeditall(No\, I don't,(?:(Yes)|No)\, I ((?(1)do|don't)),)" Cheetah says, "You had no 'replace with'." Molikai says, "THat doesn't return anything.. :)" Cheetah says, "Correct." Cheetah says, "Because you replace the whole thing with ''." Molikai says, "Ah! I see now. Dankeshun." Cheetah uses regmatch() for tests like this. Cheetah says, "Okay, I've got a few more general hints, and then I'd probably like to wrap up for now, assuming there aren't any more questions right now?" Trinsec probably more questions later, but it's getting late and I've already lots to digest. ;) Molikai nodses and agrees with trinny. Cheetah nods, and is likely to be around at various points in the near future to help with regexp (and other) questions ;) Trinsec yays. ;) Cheetah says, "So, moving on to the end then?" Trinsec nods, fine by me. Molikai says, "ci?" Cheetah says, "Right. Now, knowing my own reaction when I learned regexps, I went totally wild with them." Cheetah says, "Eventually I calmed down, and eventually I made a rule people like Sketch and Jules are probably sick of hearing me say over time ;) (Not per se always to them, mind.)" Cheetah says, "The most important thing about regexps, after knowing how to use them, is knowing when /not/ to use them." Sketch says, "Aye." Cheetah says, "Not everything works well as a regexp job. Especially if it's really easy with other code." Cheetah says, "Sort of an example for this is the 'not match'. People go 'how do I write a regexp that matches everything other than foo?'" Cheetah says, "For example, matching all attrs on an object that do /not/ match a certain pattern." Molikai blinks. not(match(%0,foo)) Cheetah says, "Short answer: You don't." Trinsec says, "what? [^f][^o][^o]?" Cheetah says, "Long answer, in this case you just take all attrs, then take all matching attrs and use setdiff()." Cheetah says, "Right, Trin. That gets incredibly tedious for anything non-trivial." Sketch says, "Especially if it's an actual regexp, not a string." Cheetah says, "So remember (Molikai got the right idea straight away) that there /is/ still code other than the regexp functions ;)" Cheetah nods to Sketch, "Quite." Cheetah says, "Any last minute questions?" Trinsec shakes head. Rusty says, "How do you write a regexp to determine if the last question has been asked? ;)" Sketch says, "Correct answer: Get someone else to do it." Rusty says, "LOL" Cheetah says, "I grab a hammer, and everyone who asks a question beyond what I deem to be the last is a problem, and hence suddenly looks like a nail ;)" Molikai can probably get nails in here, if qw poke him hard enough? Cheetah grins. Rusty says, "Great lecture, Cheetah. Quite informative. :)" Molikai says, "Da. Lots t oconsume and digest. Shall I stop logging now? :)" Cheetah says, "Okay. With that, then, I'd like to thank you for coming, and particularly, for participating. I hope you folks find use for what I said today, and that I was mostly understandable ;)" Trinsec is interested in the logs, I presume it's going to be posted on mush.pennmush.org? :) Molikai says, "Mostly! And Personally? I have /no/ Idea. i'll E-mail it t oanyone wh owants it." Logging Ends.