Post without Account — your post will be reviewed, and if appropriate, posted under Anonymous. You can also use this link to report any problems registering or logging in.

URL rewriting made easy

  • 1 Replies
  • 3090 Views
*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 437
    • View Profile
URL rewriting made easy
« March 01, 2017, 10:37:07 AM »
When implementing a site on an Apache server, URL rewriting/redirecting is a major headache. The rewriting rules are really convoluted, especially when it comes to making multiple passes through an .htaccess file. That can trip up even the most experienced expert. Even Apache's own documentation for URL rewriting calls it "voodoo".

What can be done to make this a clean-cut predictable process, rather than black magic? On this site, I tried doing a general rewriting module in PHP (rewriter.php). It used normal string processing code in PHP to disassemble the incoming URL into its component parts, modified those parts with normal PHP code (testing and string processing), glued it all back together, and passed it to the PHP header("Location: XXXX") call. It worked fantastically well, except for handling POST data from a form. That required a kludgey workaround to preserve the POST data whenever it was detected (a major objective was to not require any changes to downstream PHP files). I was never able to get form operations such as CAPTCHA to work properly. Eventually I had to concede defeat and go back to fighting with .htaccess. It turned out that my host (Lunarpages) had not set up the server in the normal manner — if the URI started with a real path, it jumped directly to that directory, bypassing the normal chain of .htaccess file processing (starting at /)! Even after I figured that out, it was still a lot of trial and error to get .htaccess URL rewriting to do exactly the things I wanted (and I'm still not 100% sure that it works right!).

Applications such as Wordpress or Drupal simply sweep up any URI that's not a real address, and feed it to a PHP routine (from /index.php) to process it in a manner similar to what I did. This is how they handle SEO "fake" paths, among other things. I haven't looked at their internals to see if and how they process POST data. Other applications, such as osCommerce, embed the various Query String data into the human-friendly URI, and use .htaccess rewriting to extract the useful data into a Query String and discard the human-friendly part. For example, /product_display/p-15234-mr-fusion-reactor might become /product_display.php?product_id=15234. SMF (Simple Machines Forum) "Pretty URLs" stores the human-friendly name (title) in a database table, along with the various parameters needed to pass to the real routines. This posting might show up as /url-rewriting-made-easy, and become internally /show_thread.php?id=6534. This avoids having to embed ugly numbers in the title (like osCommerce), but requires a database entry for each article, etc.

Another solution might be to have a standalone compiler that would take URL specifications, in some sort of language, and output a chunk of code ready to drop into your .htaccess file. If someone wanted to put enough effort into it, I'm sure it could be done. This would be something you run on your PC whenever you want to make a change to URL rewriting/redirection, rather than something running live on the server to process each incoming URL.

Any other ideas?

*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 437
    • View Profile
Re: URL rewriting made easy
« Reply #1: October 24, 2018, 12:36:00 PM »
On the subject of SEO, I think that SMF's Pretty URLs mod (SEO implementation) erred by caching in a DB table every single full raw URL entry (including the full Query String) — one for display, another to delete, another to edit, etc. If I were doing this from scratch, say for a new CMS, I would take a raw URL (non-SEO) like
Code: [Select]
/show_thread.php?topic=6534&action=modify
and convert it for output to
Code: [Select]
/show/action/modify/url-rewriting-made-easy/
where the first "directory" is the primary activity, mapping here from "show" to "show_thread.php". The last "directory" is the human-readable text that maps to "topic=6534". Any pair of directories in-between get turned into keyword=value pairs in the resulting Query String (on input). The only DB entry would be a mapping of url-rewriting-made-easy to topic=6534. All other stuff would be built off of this base entry, on the fly (for output). The only real difference from osCommerce would be that the topic id here would not appear in the SEO/SEF URL (as redundant information), but be looked up in the DB.

There is a question about where to put any real directory information, such as a discussion topic or a product which is not at the root level. Perhaps there could be an "end of Query String parameters" marker that tells (on input) where the real path starts, whether or not it's actually needed (or redundant information):
Code: [Select]
/show/action/modify/q/practical-computing/helpful-hints/url-rewriting-made-easy/
Having a minimal amount of information in the DB would help speed up the process (in both directions) and keep the database size reasonable. It would also help whenever you have to manually fix up an entry: one entry rather than dozens and dozens.
« Last Edit: October 24, 2018, 12:43:45 PM by Phil »