FAQs | Fixes | Projects | Mods and Utilities
Ever find yourself sitting around on a Saturday night, wishing you had something to do, staring at the clock and watching it slowly, slowly tick towards bedtime? Of course not. None of us are that pathetic! However, from time to time you might be casting about for a project to show off your programming prowess. If it's in the SMF arena (why else are you here?), I've listed some ideas for programming projects for SMF. They might be implemented as standalone utilities, or they might be packaged up as mods.
Many of these utilties or mods come from the SMF community forum requests for help in doing certain tasks. Unfortunately, the answer is often "there's no easy way to do that." I'm hoping that these projects will change that to "here's an easy way..."
If you know of an existing project that does what I describe, or you've written something and have made it publicly available, please let me know (PM to MrPhil on the simplemachines.org community forum). I will put your name in lights (OK, phosphors) and provide a link to the work from here. I'd love to see everything here marked "done"!
You are granted permission to link to this page, or any point on it, so long as you do not obscure or change any part of it, or claim authorship.
Some people change from the default Latin-1 character encoding to UTF-8 without doing a thorough job. That is, they've been running Latin-1 and manually convert the page encoding (and perhaps the language support files) to UTF-8, but don't translate the existing database text from Latin-1 to UTF-8. So, they end up with a database that's still holding Latin-1 (ISO-8859-1) encoded text, plus new posts and text encoded in UTF-8. For anything other than English, this means that old, Latin-1 accented characters will show up on a UTF-8 page as gibberish (question marks in diamonds, or whatever your browser uses to indicate an invalid UTF-8 character).
A utility is needed to go combing through all the text (or at least, the posts and personal messages), locate invalid UTF-8 characters (because they will not display), and replace them with the appropriate UTF-8-encoded character. As there will likely be some UTF-8 text in there, a blanket search-and-replace can't be done, as it will corrupt some UTF-8 characters. Each byte will have to be examined one-by-one to see if it's the first byte of a valid pair or triplet (for UTF-8), and if not, replace the Latin-1 byte with the appropriate pair or triplet.
This need not be limited to "Latin-1 to UTF-8". It could conceivably be done in reverse, going from UTF-8 to Latin-1 (replace valid UTF-8 pairs and triplets with a single byte). It could also handle any other single-byte encoding, not just Latin-1. The former case might be used to get a mixed database (Latin-x and UTF-8) converted to just Latin-x, in preparation for a proper conversion to UTF-8. Of course, if a forum is running in UTF-8 for a while, it's possible that characters may have been entered that don't appear in the desired Latin-x encoding. Those should be noted by the utility, and the forum owner will have to go back and manually check and possibly fix them after the database has been converted to UTF-8. It's likely that already-UTF-8 encoded characters will be corrupted by conversion from Latin-x to UTF-8.
A feature of such a utility might be to scan the text in the database first, to see what target encodings are needed. If more than one, it will be unable to sucessfully convert to one single-byte encoding, leaving some UTF-8 unchanged, and corrupted in converting (translating) the database to UTF-8. In that case, the best solution may be to convert the single-byte encoding (say, Latin-1) to UTF-8 and leave the database in the "wrong" encoding. The administrator could also, at this point, export (back up, dump) the table to a file, empty the table, convert the empty table to the right encoding, and import the data (as UTF-8).
One problem frequently reported by users is that they run their forum with the pages displayed in one encoding (say, UTF-8), while the database is in another encoding (say, the default Latin-1). This can create problems if they ever want to change their database to UTF-8, say, as ALTERing the table encoding (I think) converts the existing data. If it does, that changes UTF-8 characters already in the database "from" Latin-1 to gibberish, as they weren't real Latin-1 codes to begin with. The conversion from Latin-1 to UTF-8 should be reversible, but other conversions may not be. Anyway, the idea is to find a way to change a database table encoding without changing data which is already in the desired encoding. I think it can be done by exporting the table to a file, emptying the table, ALTERing the table, and reading the backup file back in (claiming it to be in the new encoding). I don't know if it can be done in-place.
If there is already a mixture of valid and invalid data in the database (say, some Latin-1 encoded text along with some UTF-8), it becomes a rather sticky problem. The utility could go text field by text field, examining the data to see if it's legitimate in the new encoding, and converting it if it's not. I.e., all Latin-1 encoded characters in the upper 128 positions (all non-ASCII characters) are invalid in UTF-8, and would need to be converted. However, any multibyte characters that are already legitimate UTF-8 should be left alone. It's possible that some of these are actually sequences of two Latin-1 accented characters that happen to be legitimate UTF-8 (although, the second byte of the supposed UTF-8 character would be a Latin-1 control character, not text). In that case, perhaps the utility should flag questionables for the admin to manually inspect and edit.
The utility should either make sure the forum is in maintenance mode, or force it into maintenance mode itself. In the latter case, it would restore the forum to the mode it found upon startup. Needless to say, a hacker could almost certainly do some damage by running this utility, so some means needs to be taken to prevent unauthorized use. This might include a command line password, or it might mean a "drop dead" date and time that it will not run beyond (the administrator has to edit the code to change to another date and time).
Some people make the mistake of using Microsoft Word to type in and edit their posts, and cut-and-paste the text into SMF. Unfortunately, the default setup for Word is to use their sadly misnamed "smart quotes" for various punctuation marks (especially opening and closing single and double quotation marks, and various dashes). While these characters are typographically correct, they are using non-standard character encodings, and will cause problems when displayed in anything other than MS Word. While many of these codes are not actually widely used for control purposes, it is common to have problems when the text is displayed in a different encoding. That is, text written with CP-1252-only characters will fail to display properly on the most common page encodings, namely, ISO-8859-1 (Latin-1) and UTF-8 (Unicode).
A utility is needed to go through posts and find all illegal smart quotes, which are using reserved code points in the hex 8x and 9x range. It would convert them to the appropriate UTF-8 characters, or the closest character in whatever encoding the database uses.
Here are the "smart quotes" characters in CP-1252:
| Hex | Char | Equivalent codes | Name | Reserved use | |
|---|---|---|---|---|---|
| 80 | € | U+20AC | € | Euro | reserved control |
| 82 | ‚ | U+201A | ‚ | Low-"9" opening quotation mark | Break Permitted Here |
| 83 | ƒ | U+0192 | ƒ1 or ƒ | Florin/script f/folder | No Break Here |
| 84 | „ | U+201E | „ | Low-"99" opening quotation mark | Index |
| 85 | … | U+2026 | … | Ellipsis | Next Line |
| 86 | † | U+2020 | † | Single dagger | Start of Selected Area |
| 87 | ‡ | U+2021 | ‡ | Double dagger | End of Selected Area |
| 88 | ˆ | U+02C6 | ˆ | Circumflex ^ accent (combining?) | Character Tabulation Set |
| 89 | ‰ | U+2030 | ‰ | o/oo per mille | Character Tabulation with Justification |
| 8A | Š | U+0160 | Š1 or Š | S + caron accent | Line Tabulation Set |
| 8B | ‹ | U+2039 | ‹ | Single left angle quote < (guillemet) | Partial Line Down |
| 8C | Œ | U+0152 | Œ | OE ligature | Partial Line Up |
| 8E | Ž | U+017D | Ž2 or Ž | Z + caron accent | Single Shift Two |
| 91 | ‘ | U+2018 | ‘ | "6" opening quotation mark | Private Use One |
| 92 | ’ | U+2019 | ’ | "9" closing quotation mark/apostrophe | Private Use Two |
| 93 | “ | U+201C | “ | "66" opening quotation mark | Set Transmit State |
| 94 | ” | U+201D | ” | "99" closing quotation mark | Cancel Character |
| 95 | • | U+2022 | • | Solid bullet | Message Waiting |
| 96 | – | U+2013 | – | En-dash | Start of Guarded Area |
| 97 | — | U+2014 | — | Em-dash | End of Guarded Area |
| 98 | ˜ | U+02DC | ˜ | Tilde ~ accent (combining?) | Start of String |
| 99 | ™ | U+2122 | ™ | Trademark TM | reserved control |
| 9A | š | U+0161 | š1 or š | s + caron accent | Single Character Introducer |
| 9B | › | U+203A | › | Single right angle quote > (guillemet) | Control Sequence Introducer |
| 9C | œ | U+0153 | œ | oe ligature | String Terminator |
| 9E | ž | U+017E | ž2 or & #382; | z + caron accent | Privacy Message |
| 9F | Ÿ | U+0178 | Ÿ | Y + diaeresis/umlaute accent | Application Program Command |
Since ignorant members can make this a recurring problem, it might be useful to make a mod to check all posts as they're being entered or edited, and clean them up then and there. A utility would still be needed to find and clean up existing posts. Or, you can throw up your hands and just locate and translate (to the page's character set, or HTML entities) these bytes "on the fly". That is, scan for byte x'93' (for example) and replace it with “.
Sometimes you have a forum that's gone inactive for a while, and want to revive it and make it look like it's alive and well (i.e., posts are being made now). There are a number of tables in SMF that include timestamps for various events, in both seconds-count form (Unix epoch seconds count) and in date-only format (yyyy-mm-dd). The utility needs to be careful not to accidentally put any dates in the future (this may be desired, so don't outright forbid it). Possibly, some tables and fields will have to be handled separately from others, so there needs to be a way to select which tables and fields get updated. The user would give a specific time and date to move the latest item to. As this isn't something that would be done very often, there's no need for a fancy interface — just put the settings in the PHP code. Oh, and be careful to handle user-input times with the correct time zone, and be aware of timestamps in the database created with now(), which apparently stores not a Unix-style seconds count, but one already offset by the server time zone.
If implemented as a standalone utility, the user needs to be careful to disable this script when done, lest some hacker find it and play with it! This might be best done as a "drop dead" time and date setting, after which the utility will no longer run (until the code is edited to change the date and time).
Sometimes SMF users ask about changing one member ID to another (not already in use), or compacting the list of IDs (member, message, personal message, etc.) for one reason or another. Here would be a pair of utilities to perform such operations. The core would be a function to change all instances of an ID to another, in selected tables. For example, you might want to change a particular user to be the (accidentally deleted) Admin, with a specific ID. Or, you might want to reassign postings by one member to another (say, someone registered twice, and has postings under multiple IDs, and you want to consolidate them). This is trickier than just changing a member ID to an unused ID, as you have to watch out for possible conflicts in message numbers.
Anyway, the idea would be to cleanly change all instances of one ID to another, along with all references to that number ("foreign keys") being properly changed, and in the correct order. The forum should be put into maintenance mode for work such as this (and a backup made, of course). There might be a function to put the forum into maintenance mode, rather than relying on the administrator to remember to do it. Also, note that if the new ID is larger than the previous largest ID, the database needs to make sure its "auto-increment" value is properly updated. This might be done with an ALTER or it might be done by adding and deleting dummy rows until the desired new ID is reached.
An extension of the above utility would be to compact all IDs down below some limit, or to where there are no "holes" in the list. I'm not sure what the point is of the latter case (maybe the admins suffer from OCD?), but in the former case, spammers have been known to fill up the message table with so many postings that the ID counter wraps around! The spam postings would be removed first, but this would leave a few legitimate postings way, way out there. What the utility would do would be to find the largest ID and the smallest "open" (unused) ID, and call the first utility (or functions) to move the largest ID down to the smallest open ID. Repeat until there are no gaps, or the largest ID is down below some limit. ALTER the auto-increment value so that it's at the correct point.
The utility should either make sure the forum is in maintenance mode, or force it into maintenance mode itself. In the latter case, it would restore the forum to the mode it found upon startup. Needless to say, a hacker could potentially do some damage by running either of these utilities, so some means needs to be taken to prevent unauthorized use. This might include a command line password, or it might mean a "drop dead" date and time that it will not run beyond (the administrator has to edit the code to change to another date and time).
Write a PHP script to go through all messages (posts) and signature entries in the database, looking for both explicit URL tags ([url] and [iurl]) and implicit ones (whatever text that SMF would turn into a link). When one is found, check it (possibly by feeding it to fopen()?) and log any error reported. If it is a 404 or server not found error, disable the link. In some cases it may be possible to actually fix the link, if there is a known pattern to find/replace.
This would be fairly CPU intensive, so you'd probably want to do something to break up the job into little chunks (say, do 50 posts and then let the CPU cool down for a few minutes before tackling the next batch).
Similar code could be used to find all occurrences of a given URL and change it or remove it (e.g., for when you change the name of your domain or installation path, and want to update all references in posts and signatures, or, you've changed another domain that you have links pointing to).
SMF 1 cannot "throttle" its email sending rate, so it can be a problem to send out a newsletter. You can run afoul of your hosting service's email caps (per minute, per hour, per day limits), since all the emails are dumped into the outbasket at once. SMF 2 will be able to throttle emails, but in the mean time, you may want to use a utility such as PHPlist to send your emails at a throttled rate. The problem is that PHPlist needs a CSV (comma-separated values) file listing members and their emails, and SMF provides no easy way to export this list.
A utility could export the smf_members table data to a CSV file, ready for PHPlist (or other programs) to use. Obviously the user email field needs to be output, but PHPlist can do mail-merge using other attributes, such as member names, ages, genders, date of most recent post, etc. It can also select which members to send to, based on criteria applied against the data in the CSV file. So, the utility needs to be configurable as to which fields get written out to the file. PHPlist also wants a header row labeling the fields, but other programs might not. The utility might even select which members to write out, based on posting activity or some other criteria. You might want to blast a friendly email reminder to members who haven't posted for some time (PHPlist may be able to select this, based on last post date). The sky's the limit!
It would not be practical to erase such a utility when you're done with it, as it would be used fairly often (like every time you send out a newsletter). A hacker using this utility would not be able to damage the database itself, but depending on whether there are command-line parameters (URL Query String), they could mess up your mailing list CSV files, or at least, chew up some CPU cycles. Some sort of password protection on the command line might be desirable. This would not necessarily be an interactive program (with a GUI), although that could be done to facilitate what data gets output for what utilities.
(There are products out there which do at least some of this, but I don't know how good they are.)
A not uncommon request is for a way to archive older forum entries into static HTML files. You would want to preserve the board names, topic names (subjects), date and time, and member name. Attachments and avatars might (or might not) be brought over to the HTML world, or at least a list made of what needs to be moved (administrator configuration). All BBCode would be expanded to HTML, so that there are no dependencies on SMF. An index HTML file could be generated to allow a browser to go directly to a board, and then to a topic, looking much like a live SMF system. Presumably each topic would get one HTML file, although for very large topics it might be desirable to split it up into multiple pages. The administrator would select the range of dates (all, or ending at a certain time and date) and which boards to archive. A topic with newer posts might not be archived at all, or would have only the older posts archived.
Part of the archiving process would be to (usually) delete the archived material. If only part of a topic is archived, you want to keep the initial post, even if the older replies are archived and removed. Boards, child boards, and Categories would normally be preserved, even if they're now empty. Of course, the database should be backed up first, in case something goes wrong. There is an argument for making the deletion of old material (with or without archiving it first) a separate step, in that the archives could be checked and validated before vaporizing the original data. On the other hand, this means leaving the forum locked (maintenance mode) while that is being done, so that new posts aren't made, possibly messing up the process (e.g., a topic that would otherwise be deleted now has a new post, so it wouldn't be archived in the first place!).
The utility should either make sure the forum is in maintenance mode, or force it into maintenance mode itself. In the latter case, it would restore the forum to the mode it found upon startup. Needless to say, a hacker could do some damage by running this utility, so some means needs to be taken to prevent unauthorized use. This might include a command line password, or it might mean a "drop dead" date and time that it will not run beyond (the administrator has to edit the code to change to another date and time).
The support forum has many tearful pleas of "I accidentally deleted an entire topic/board and I need to get it back!" Provided that there is a recent backup, this can be done, but it's labor intensive and prone to error. A utility is needed to read a backup .sql file, pick out the desired entries, and put them into the database under their old dates. Care must be taken not to create duplicate IDs at any level, or get posts out of order. It would probably be easier to create a "new" board (if needed) and "new" topics and posts, and update their timestamps and member IDs to the original. Don't forget to do something about attachments — either remove them if they're unavailable, or put them into the right place under the right name if they still exist.
The utility should either make sure the forum is in maintenance mode, or force it into maintenance mode itself. In the latter case, it would restore the forum to the mode it found upon startup. Needless to say, a hacker could do some damage by running this utility, so some means needs to be taken to prevent unauthorized use. This might include a command line password, or it might mean a "drop dead" date and time that it will not run beyond (the administrator has to edit the code to change to another date and time).
Implement a mod for a [tree] tag to build a text + CSS tree structure with the child/root at left (and [rtree] to build with child/root at the right). Recursively define subtrees in the same manner, with any node having 0 or more subtrees. This could be useful for genealogy charts, sporting tournament brackets, and the like. Output could be HTML+CSS with character graphic lines, or a call to a PHP script as src to an <img> tag.
A sample input showing a person, their parents, and grandparents:
Both [rtree] and [tree] might be in the same post, coming down to the shared root (tournament winner) in the center. Vertical trees might be done too, but that would need to allow multiple children and possibly multiple parents (depending on the nature of the data—think "family tree").
Note that it might be easier to implement as a single [tree] with some sort of long data string, and build the tree all in one operation. () surrounds a tree or subtree, and | marks the dividing point between child(ren) and parent(s). :: divides multiple children or multiple parents. Use [] or {} to enclose "reverse" horizontal trees, perhaps with subtrees defined on the left and the child(ren) at the right.
From time to time you may wish to import documents prepared for other purposes into SMF (into a post). SMF cannot directly read formats such as Word, RTF, or HTML—it requires BBCode markup. There are a number of tools available to convert .doc or .rtf to HTML pages. Once you have HTML, if you wish to include it in a post, the HTML tags need to be converted to BBCode (e.g., <i> to [i]). In addition, the page overhead (<html>, <head> section, etc.) needs to be removed, leaving just your body text. There may be existing HTML to BBCode converters, but they should be used with caution, unless they state that they are tuned for SMF's particular version of BBCode. Finally, you cut and paste the converted document into the post.
It would be possible to embed an HTML document within an SMF post, either by changing HTML to BBCode during the posting process, or by pulling in the HTML document (body text markup only) during post display. This is probably something that you would not want random members doing. The reason that BBCode exists in the first place is that some HTML markup could be maliciously used to cause problems. The ability to pull in an HTML markup should be restricted to administrators and other trusted people.
Note that this may make sense for very short documents, although even in that case, marking up a cut-and-pasted block of text by hand may be faster and easier than going through several conversion steps. For lengthy documents, stop and consider whether your members are really going to want to wade through something that long in the format of an SMF topic. For such documents, it make make more sense to leave the document itself as an HTML page (display) or PDF file (download). Make your SMF post a short summary or abstract of the full document, with a link to display the HTML page or download the PDF file.
See my SMF mod request for additional information.
It is a constant problem that other forums (vBulletin, phpBB, etc.) have BBCode with a slightly different format than SMF, e.g., [color="red"]text[/color] rather than [color=red]text[/color]. This is a problem when someone creates a posting on another forum, and then tries to cut and paste the raw code (including BBCode) to SMF. SMF can't always interpret the slightly different BBCode.
This project would be a mod to make SMF understand additional BBCode formats (both new tags and different syntax for existing tags, such as [color]). You would "drop in" a phpBB BBCode mod to extend the base SMF in this manner. As part of the same mod, or as a separate mod, the editor buttons would be modified to output BBCode compatible with the "other" forum's BBCode. Potentially, more than one foreign board BBCode could be supported in a given SMF installation, if the only difference is the input syntax. The mod would pull out the existing BBCode parse routine into a separate file, to which mods would add chunks of code for other forum support. Likewise, the editor buttons would be pulled out into a file which could be updated to produce a different syntax of BBCode.
SMF 2.0 provides a post editor mode (toggled) that is semi-WYSIWYG (What You See Is What You Get). Most, but not all, BBCode tags get converted to HTML markup (as with the Preview) and the editor mode handles changes in WYSIWYG mode. Some tags, such as [quote], don't get converted to WYSIWYG. When toggling back and forth between modes, some tags may be permanently added, such as [url] around Web addresses.
SMF 1.1 could use a good WYSIWYG editor. It would need to accept regular BBCode markup from the server, convert it to HTML for display and edit, and convert back to BBCode for saving. All BBCode tags should be properly rendered in HTML, as is done in Preview mode. Basically, the whole BBCode support from SMF would have to be loaded into the page as JavaScript, and the inverse operation would need to be done.
To ensure fidelity in the round trip between modes, when BBCode is converted to HTML (with the same CSS usage as in the Preview/post display), embed the original BBCode as HTML comments. [i]text[/i] might become <!-- [i] --><i>text<!-- [/i] --></i>, etc. The comments would not be seen by the user and would not be affected by their editing, except that if the italicization is removed from the text, the comments disappear along with the <i> tags. Of course, this is trivial with simple tags like <i>, but becomes important with complex tags such as [quote], [code], or [url].
Something often requested is a way to turn words in a posting into links to another page. The usual suggestion is to use the "word censor" function to replace a word with a link (HTML or BBCode — depending on whether censoring comes before or after BBCode expansion). There are several known problems with this:
What might be best is to adapt the "word censor" function so that:
Unfortunately, it seems that some rather juvenile people have gotten into the development of Linux spelling checker software, and have inserted obscenities into the dictionaries. While this may not be a problem on some forums, on others, members are horrified to have rather risque words suggested as spelling corrections. Until such time as the spelling checker utilities themselves offer a flag for choice of "full language" (warts and all) or only "G-rated" suggestions, SMF will need a mod to examine returned spellings and filter out undesirable ones. This code could included in the base product, but the list of words should be up to the forum owner. Whatever "censorship" is done should be up to them, as they know their intended audience. The mod would simply look for certain words (and word fragments), and remove them from the suggestion list. If the result is that there are no words left in the suggested spellings list, treat the word as correctly spelled.
A request in the 1.1 Support board for a way to send a newsletter to everyone when a new topic was created got me to thinking about the following: implement a generalized way to do specified things on specific actions. For example, create an empty stub of a function, called whenever a topic is created. The forum owner would fill in whatever action they wanted to take; for example, we might provide a /* commented out */ routine to blast a newsletter to all members (possibly limited to certain boards, or to certain groups, or if started by certain members). Forum owners would customize the PHP code to do whatever special (nonstandard) actions they wanted.
There could be specific routines (stubs) for a number of actions: on_topic_create(), on_board_create(), on_post_message(), on_move_message(), etc. Or, one catch-all routine that includes a parameter for what happened (TOPIC_CREATE, TOPIC_MOVE, TOPIC_DELETE, BOARD_CREATE, BOARD_MOVE, etc.) as well as the specifics of category, board, topic numbers, etc. There might even be "before" and "after" operation calls -- say, it might be used to check content or permissions before allowing an operation (returns OK or FORBID), in addition to doing some sort of cleanup or broadcast after the operation (returns OK).
Would this be useful enough to implement? What other things might be done besides a broadcast? One of the concepts is that all these custom functions could be rounded up and put in one place, rather than scattering code all over SMF. If the function ships with nothing but comments, I wouldn't think it would have any noticeable impact on speed. Of course, PHP coding would be required by the forum owner, but there could be mods for certain commonly-requested operations that would take care of that. If this were to be made a standard SMF feature, no routine SMF operations would be included in this routine — just owner-added function.
When a member makes a post that is a certain milestone (by count), say, their 100th, invoke a PHP script to do something, such as update a table entry or send them a congratulatory email. At a minimum, this would involve a table with post count (or a range of post counts) and a PHP script file name to include() (and thus execute). Note that if the member exceeds the count and gets whatever action, and then has posts removed by a moderator, then posts more, they'll get the action again. You may want to put a flag somewhere that would only allow a milestone to be reached once. If posts made and posts removed are tracked separately, you might only look at the "posts made" count, not the net number of posts. That could still be open to abuse, as some members might churn out a bunch of useless posts just to make some milestone. Something will be needed in Admin to enter/edit/delete the database entries, as well as any member flags.
The idea is to provide a function call (PHP skeleton) to examine the contents of a post, and take action upon what is found, when the poster clicks "Post". All forums have problems with spammers registering, and then flooding the forum with spam. Many forums have problems with abusive members, or jerks who just post nonsense or a one-word reply or SHOUT. All these things could be checked and dealt with.
Other aspects of the posting can be checked, so long as we're here, such as screening out banned posters and spammers posting too often (minimum interval, for flood control) from an IP address, or more than a certain number of times a day.
returns
In addition to the return code, a message would also be returned (empty string if nothing to say):
Within check_post(), first translate (in a local copy) digits and punctuation commonly used by spammers to evade word checks, such as "v1@gra" becoming "viagra". Perhaps check word fragments against a dictionary and remove unwarranted spaces ("vi ag ra" is three non-words; change to "viagra"). Calculate a spam_index of banned words showing up in text — the more questionable words, the higher the index. Do the same for abusive words (e.g., calling someone an @$$hole) in abuse_index. Calculate a shout_index as the fraction of letters that are UPPERCASE. Get the word_count (one may wish to ban one or two word posts). Count links onsite, to approved sites (need a list), and to unapproved sites, as well as the density of these links.
Mix these indices together, based on some weighting formula that includes how recently the member registered, how many post attempts they've made while brand new, and how many posts they already have. The idea is to be stricter with brand-new members, who may be spammers. A forum owner could also be stricter with a list of "problem members" (e.g., all of a certain member's posts could be action 3). Based on the overall value (score), return 0 to 4 as an action. There would be a degree of "fuzziness" in the logic, so rather than sharp cutoffs, one option or check blends into others with a degree of severity.
We can provide the basic functions and allow the forum owner to mix and match what weightings to give to various factors. It should be easy to change what things are looked for and what weightings are used, but at some point, the owner is going to have to do some PHP coding (or at least, read what's provided and be able to tweak settings). It should be easy to add a list of banned words to be looked for, possibly as an array defined in a separate include file. Some forums may wish to be lenient on some words while others may wish to be very strict. The banned word array might include a weighting factor for each word given. Don't forget spelling variations (legitimate, typos, and evasion attempts).
For SMF 1.1, not all of the 0–4 actions may be applicable (e.g., no moderator review of posts, unless that's added in as a separate mod), so either that redirection to a restricted board would have to be added to the mod, or it would have to be treated as 2 or 4. There's already the capability to warn a poster that someone else has posted while they were typing their post, so that mechanism could be co-opted for action 2.
The forum owner may wish to run a "trial period" with the mod, where actions 1–4 are merely logged, and the member doesn't see anything different. Once the forum owner has tweaked the settings to their satisfaction, they can go live.
I have seen a number of requests (mods and features) for code to do many of these functions separately. The idea here is to put them all in one central place, with a single mechanism for inviting the poster to revise the text, solve a CAPTCHA, route the post to an inspection queue, or reject the post. As an anti-spam tool, requiring CAPTCHA for the first N posts over M days since registration would be quite useful. Members who are on some kind of warning or probation could get special (stricter) treatment. The mechanism could be expanded to disallow starting a topic until some criteria has been met. This might also be done by maintaining post count-based groups, but wouldn't handle "special members" or spammers who flood existing topics with spam in order to get out of a count-based group restriction.
Given the member's preferred language, the function would return the message string in that language (if available). For example, a code of 1 and "As a new member, you must solve a puzzle to prove that you're not a spambot.", that sort of thing.
By putting all this in one routine, the forum owner is able to mix and match and weight criteria, and even add their own criteria. Being able to combine and weight different checks may catch spammers and other abusers who are just able to slip under the radar with separate checks. The default, as shipped, might generate some of the check values, but weight them all as 0, so that all posts pass (return code 0). There could be a "weighting values" interface in admin so that code-phobic owners can enable and adjust the function without dirtying their hands with code. That could handle the default set of factors, but would have to be manually updated if any new factors were added.
A new threat from spambots is copying all or part of an earlier post in a topic, and either adding some spam content after that or just using a spammy signature block. The idea is to disguise the post content as legitimate content while still inserting spam links. SMF needs to check a new post (the material not in a [quote] block) to see if it's a copy of an earlier post. A [quote] block with nothing else (no new material) is also a dead giveaway. It's reasonable to compare a new post against earlier posts in this topic, but probably not worth the overhead of checking against the entire post (messages) database. Anyway, if a new post seems to be largely or entirely a copy of an earlier post, at the least this should generate an automatic "report to moderator" email to take a look at it.
Sometimes an administrator wants to remove all or part of a certain member's postings and PMs, as they are spam or otherwise "undesirable". This utility would go in and delete all this member's posts or PMs according to certain criteria set by the administrator (date range, certain words, etc.). There would be a flag to delete entire topics that this member started, or just replace the initial posts with something empty and harmless. Usually, if the first post was spam, the replies are not worth keeping, and can be deleted.
The utility should either make sure the forum is in maintenance mode, or force it into maintenance mode itself. In the latter case, it would restore the forum to the mode it found upon startup. Needless to say, a hacker could do some damage by running this utility, so some means needs to be taken to prevent unauthorized use. This might include a command line password, or it might mean a "drop dead" date and time that it will not run beyond (the administrator has to edit the code to change to another date and time).
This mod would, at user request, not look inside of code blocks when searching for text. That should greatly cut down on "false positives" returned from a search. Unfortunately, the searching is done by MySQL, so there doesn't seem to be a way to tell it to exclude anything between [code] and [/code] from its search. Perhaps it could be done with SQL procedures, or with plug-ins to MySQL, but that's well beyond my ability (and most shared server hosts are not going to be keen on letting customers install modules into MySQL).
There are at least two approaches. The first would be to split up the post, at creation (or edit) time, into non-code and code blocks. They would be stored separately and searched separately, and glued back together at edit or display time. Since there could be many [code] blocks interspersed among non-code, some way of packaging each into one entry would be best. A variation on this would be to package it all into one database entry (as at present), but pull out the code blocks into "subroutines" that are placed at the end. Upon retrieval, SMF would check if the first "hit" was in the "code subroutines", and would ignore the entry if code blocks were being excluded.
Another way would be to leave the database as-is. When an entry comes back with a "hit", SMF would remove any code blocks and test again to see if there's still a hit. This means duplicating MySQL's search functions, to work on a text string that needs to be re-searched after code is removed. Needless to say, either method will be a performance hit. The idea is that CPU cycles are relatively cheap, while people time is expensive, and the fewer results one has to look through, the better.
Some people prefer to receive their forum updates in a daily mailing, or "digest". This could be done with a cron-activated utility which reads all the requested boards for activity since the last digest mailing, and bundles them up into one email and sends it off. A couple of notes: 1) if there is a lot of activity, the digest will have to be split into smaller pieces, perhaps mailed at intervals, and 2) you will need some mechanism for throttling your email send rate, or else you will exceed your host's email caps.
A related proposition: some people would prefer to submit postings by email, rather than having to sign on to post. This could also be useful for automated feeds from other programs. Automatically handling incoming emails and getting them into SMF is going to be somewhat system-dependent, but once the email has been read by SMF (a separate utility?), its ID and password can be verified, and the text can be posted to the appropriate board or topic. Perhaps SSI.php has sufficient function to permit this? Members could be limited to email postings in specified areas, if the administrator wishes to restrict them.
As I understand it, all groups in SMF are inclusionary groups. That is, if a member is part of any allowed group, they're in. At times, though, one wishes to exclude members based on group membership. It wouldn't matter that they are members of one or more inclusionary groups — if they're a member of any listed exclusionary group, they're dead meat. This wouldn't necessarily mean that the group mechanism would change, but that a given group might be treated as "in" or "out" depending on where you are in SMF. For example, a group membership based on group counts might put newbies into one group and old-timers in another. "newbies" would be required to solve a CAPTCHA in order to post, while "old-timers" wouldn't. "old-timers" could start a topic, while "newbies" can't.
I see requests for a way to force a PM (from an administrator or moderator) to be sent to a large number of members, but ignoring the members' profile settings. That is, force an email, or no email, or no PM...
Of course, these options would be available only to administrators and possibly to moderators, when they want to be sure that all members see a notice, or don't want to overload the email system.
SMF needs to be able to use different email return addresses for different purposes. For example, even though PM email copies specifically say "do not reply", plenty of morons do hit the Reply button. Since the webmaster or administrator email is typically used for the return address, they end up receiving personal messages from one member to another. This raises privacy concerns, and members accuse the administrator of snooping. Such uses of a return address should be something "NO-REPLY" and could be disappeared by the mail system (or, bounced back to the sender), without the administrator ever seeing them. Of course, members will still be complaining that their replies weren't delivered...
New members might receive signup confirmation from "new-members", emailed warnings might come from a different ID, new passwords might come from something other than "new-password" (!), digests and summaries could be mailed by "digests", etc. There could be a whole bunch of mail IDs, provided that (except for NO-REPLY), the mail actually gets read by someone.
This could be implemented as a separate database entry for each use (not the single shared email address now used), or more simply, as "defined" string constants set in some common file. For either case, if nothing is defined for an address, fall back to some common address (say, the webmaster address in the database) guaranteed to be there from setup. That way, a valid address is always used, but an administrator can get away with being lazy and not defining any additional ones. A new function would have to be written to provide the correct address, or the fallback, for the given use.
When a moderator moves a topic to another board, a locked stub topic can be left behind to show visitors where the topic has been moved to. There are two shortcomings with this.
SMF 2 has a mod to "expire" the topic and remove it after some period of time. Such a function is needed for SMF 1.1. There's no point in having a "moved" topic hanging around forever, until someone bestirs themselves to manually erase it. The deluxe solution would permit this "moved" topic's expiration date to be reset, if needed.
The other problem with "move" topics that needs to be addressed is that the moderator who moves the topic gets their name on the topic, not the original post starter. Sometimes you don't want your name associated with something that's been moved (because of the subject matter), or it would be nice for the original member to keep their name on it. Of course, you don't necessarily want to original member to "own" the "move" topic. This would require a change to SMF to keep both the originator and current "owner" of any post or topic, who might not necessarily be the same member, and only the current owner has any rights to it (to modify, delete, etc.). Even though the text of the move topic could include the original poster, and this wouldn't require any database changes, the "started by" and "last post by" in the board listings would still be the moderator and not the original poster.
Some members like to invent exotic names for themselves, using unconventional characters from the Unicode alphabet. Unfortunately, this can lead to problems when other members want to type in that name, such as referring to that member in a post, or sending a PM. Sometimes "cut and paste" will work, but that can be inconvenient if that member name doesn't happen to exist on the currently displayed page. Some characters may not exist in some browser font sets (resulting in an undisplayable character glyph), or be difficult to type in on most PCs, regardless of the page encoding.
This mod would insert a routine to check that a requested display name includes only characters drawn from a designated alphabet. Or, it could check that certain forbidden characters are not used (but anything else goes). The switch (of which way to behave) could be hard-coded into the routine, along with the alphabet whitelist/blacklist, or it could be integrated into the Admin controls. Some forum operators don't want spaces (blanks) in names.
This mod could also be used to check the requested name against a list of reserved names ("*admin*", "*support*", etc.) as well as containing "bad" language, and reject such names. There should already be a list of reserved names somewhere in SMF — this could be coded as an extension to that code.
There is apparently nothing in the package installer (mods, etc.) to keep one from installing a package twice. There are many questions in support about "Cannot redeclare function name", caused by installing a mod twice. The installer should make some effort to determine whether a package is already installed, and to refuse to install it if it thinks it is. The user needs to manually clear out the remains of a failed installation, if there's anything in the SMF "registry" listing the package as already installed. As this will be keyed on the name of the package, there's nothing to prevent a conflict between two different but same-named packages (particularly if they don't come from the same package repository), or that two different mods contain the same function name (again, unlikely in a single package repository if its keepers are doing their job), or if the same mod was renamed (user error).
To tell the truth, the whole package installer is broken and should be rethought. Most support questions revolve around a package that fails to install, or installs twice, or fails to completely remove. See this discussion. Perhaps the installer should take a list of mods to be installed, and always start from the base "vanilla" forum and install everything in one go? This would of course require storing an unmodified "vanilla" copy of the application (or at least, the .php and other editable files). It would reduce the chances that a subsequent mod can't find its code-change target in the files, but it still requires some thought on how to apply two mods that modify the same piece of code, without the second mod destroying the effects of the first. Mod un-installs (removals) would be guaranteed to go cleanly, as you would be starting over each time with a vanilla copy of the forum.
SMF currently uses the server time as its starting point for calculating the date and time. A selected number of hours (converted to seconds) can be added to correct the server time to the "forum's" timezone (e.g., my server is in California and I'm in New York — I'd like Eastern Time to be shown). Individual users (members) can add an additional offset if they're not in the same timezone as the forum.
The complication is that the forum's location, as well as individual members' locations, may not use the same DST (Daylight Saving Time) rules as the server is using. The server may even be configured to use GMT or stay on local standard time. A world-wide forum may wish to use GMT, or local standard time.
All this could be dealt with by eliminating the forum and member offsets, and instead use the new PHP 5 timezone function (date_default_timezone_set()). The forum itself would have a default timezone name (e.g., "America/New_York"), and each member could select their appropriate local timezone (e.g,, "America/Chicago"). This would require storing the timezone name instead of hour offset. It might be best stored as an index into a table of timezone names. The fact that this table will change from time to time (be updated) must be taken into consideration, as well as that a member's timezone selection may become invalid and need to be changed. Finally, the conversion of existing forum and members offsets would involve taking the current offset and replacing it with a best-guess timezone selection. Many existing members would want to go into their profile and fine tune their timezone selection to the most appropriate city.
Before embarking on this project, check mod 1504 — I don't think it does exactly this, but might be a good starting point. And of course, one's server needs to be at PHP 5.2 or whatever the minium is to provide support for the new timezone functions.
Something asked for from time to time is a way to shorten a bit of text, either to provide a "teaser" link, or to make a link or something fit within a certain amount of space. This can be trickier to do than it looks at first glance.
The function call may look something like:
One thing that forum owners often do is limit the length of time in which a member can edit a posting (or delete it). After the time limit is up (anywhere from immediately to a few minutes to never), a member cannot change or delete an already posted entry. The purpose of this is to preserve continuity and flow of a conversation. Some members like to erase their original questions once they're answered (sometimes out of spite or embarrassment), which makes following responses utterly nonsensical.
Unfortunately, members sometimes have a legitimate need to add corrections,
elaborations and explanations, or even apologies to what they had written
earlier, but are locked out from doing this. If there have been a number of
replies to the posting, a new posting could be quite a ways away and missed by
readers. A mod or feature would be useful to allow the member to append
additional material to a posting, without allowing any changes to the original
material. Or, only [s]strikeouts[/s] could be permitted as
changes to the original material.
It is an open question as to whether added material is subject to the same (additional) time limit, or is frozen when the original text is. Without adding new database entries (timestamps of when additional material was posted), only the original posting timestamp would exist. An alternative would be to create a new posting as a "subposting", where it would appear indented and under the original posting, with the same background and no divider rule. This could work like a "threaded" system where you can insert a reply to an old posting, and have it appear directly under the original (and after any existing replies to this posting), but limited to the original author. Don't take up space showing the avatar, name, motto, etc., as that's available at the original post. There could be a time limit on modifying the added material, after which it's frozen and all the author can do is add another "added material" post.
Moderators and administrators should always be allowed to edit any posting or subposting, no matter how much time has passed.
A common request is how to order the display of posts within a topic. SMF lets you globally display oldest (at top) to newest (at bottom) or vice-versa, but that's about it. It would be nice to be able to give a global default ordering, but be able to specify the ordering for specific topics, boards, or even categories. This would require extra table fields to specify the ordering method at each level, and an administrative interface to update the new field.
Another request that comes up is to be able to "pin" certain posts within a topic, much as you can pin topics within a board. As an example, you may want to pin the first (oldest) post at the top of the topic, because it contains general or global information, but display the replies (newer posts) newest to oldest. This could be a "pin" flag in a topic table, with all pinned posts being ordered in the same order as unpinned posts, or forced in a certain order. Administrators may also want to automatically pin the first post to the top for any topic in a given board or category — this might be done by automatically setting the appropriate "pin" flag in the first post. Something that has been requested more than once is the ability to repeat the first post at the top of each display page, as a reminder of what the discussion is about.
A separate issue that affects post ordering is doing a "threaded" presentation. This requires knowing the structure within a topic (what post is in reply to what other post) and requires some additional fields. In addition, the matter of banning changes to a post after a certain time has passed, or a reply has been made (see Updates after a certain time period) might require a threaded presentation. This should be kept in mind when designing a post order mod.
Sometimes it's useful to display different images depending on the member status of a visitor (guest, member of certain groups, etc.). This might be done to limit access to "premium" content to paid members, for example. Guests and unpaid members might see a "teaser" copy, while paid members could see the full image. This could be expanded to generic attachments and even external links. If all images (avatars, inline images, attachments, signature block images) go through a standard SMF function, code could be added to check for a specific image or attachment name in a list and substitute something else in the HTML output. For links, this code could disable or change external links depending on the visitor's credentials, just as with images.
This is separate from controlling what kind of images or links a member is allowed to post, but could operate on similar principles. That is, a member in a "newbie" (probationary) group could make all the attachments and inline images and links they want, but while they are still in that group, their output links (images, links, attachments) will be dummies. The forum administrator or moderators could override this to be able to see them. If both functions (deactivate poster images/links/attachments for certain groups, and change/deactivate visitor images/links/attachments based on visitor status) are implemented together, a smoother and more consistent user experience would happen.
Once you "have control" when various images, attachments, or links appear in a post, you might be able to add additional code to do other things specifically when this image, etc. is requested. For example, a "hit counter" might be incremented when certain images are shown. It would be easiest to do this as include()'d inline code, although it might be possible to call external scripts.
© Copyright 2010–2011 by Catskill Technology Services, LLC