A committee is a cul-de-sac down which ideas are lured and then quietly strangled.
— Sir Barnett Cocks
Posted on 2017-Mar-01 at 10:37:07 by Phil
Last update on 2022-Mar-21 at 15:05:00 by Phil
When implementing a site on an Apache server, URL rewriting/redirecting is a major headache. The rewriting rules are really convoluted, especially when it comes to making multiple passes through an .htaccess file. That can trip up even the most experienced expert. Even Apache’s own documentation for URL rewriting calls it “voodoo”.
What can be done to make this a clean-cut predictable process, rather than
black magic? On this site, I tried doing a general rewriting module in PHP
(rewriter.php). It used normal string processing code in PHP to disassemble the
incoming URL into its component parts, modified those parts with normal PHP code
(testing and string processing), glued it all back together, and passed it to
the PHP header("Location: XXXX")
call. It worked fantastically
well, except for handling POST data from a form. That required a kludgey
workaround to preserve the POST data whenever it was detected (a major objective
was to not require any changes to downstream PHP files). I was never able to get
form operations such as CAPTCHA to work properly. Eventually I had to concede
defeat and go back to fighting with .htaccess. It turned out that my host
(Lunarpages, now HostPapa) had not set up the server in the normal manner
— if the URI started with a real path, it jumped directly to that
directory, bypassing the normal chain of .htaccess file processing (starting at
/)! Even after I figured that out, it was still a lot of trial and error to get
.htaccess URL rewriting to do exactly the things I wanted (and I’m
still not 100% sure that it works right!).
Applications such as Wordpress or Drupal simply sweep up any URI that’s
not a real address (i.e., is not a real directory or file, per -d
and -f
flags),
and feed it to a PHP routine (from /index.php) to process it in a manner similar
to what I did. This is how they handle SEO “fake” paths, among other
things. I haven’t looked at their internals to see if and how they process
POST data. Other applications, such as osCommerce, embed the various QueryString
data into the human-friendly URI, and use .htaccess rewriting to extract the
useful data into a Query String and discard the human-friendly part. For
example, /product_display/p-15234-mr-fusion-reactor
might become
/product_display.php?product_id=15234
. SMF (Simple Machines Forum)
“Pretty URLs” stores the human-friendly name (title) in a database
table, along with the various parameters needed to pass to the real routines.
This posting might show up as /url-rewriting-made-easy
, and become
internally /show_thread.php?id=6534
. This avoids having to embed
ugly numbers in the title (like osCommerce), but requires a database entry for
each article, etc.
Another solution might be to have a standalone compiler that would take URL specifications, in some sort of language, and output a chunk of code ready to drop into your .htaccess file. If someone wanted to put enough effort into it, I’m sure it could be done. This would be something you run on your PC whenever you want to make a change to URL rewriting/redirection, rather than something running live on the server to process each incoming URL.
Any other ideas?
Posted on 2018-Oct-24 at 12:36:00 by Phil
Last update on 2018-Oct-24 at 12:43:45 by Phil
On the subject of SEO, I think that SMF’s Pretty URLs mod (SEO implementation) erred by caching in a DB table every single full raw URL entry (including the full Query String) — one for display, another to delete, another to edit, etc. If I were doing this from scratch, say for a new CMS, I would take a raw URL (non-SEO) like
and convert it for output to
where the first “directory” is the primary activity, mapping here
from “show” to “show_thread.php”. The last
“directory” is the human-readable text that maps to
“topic=6534”. Any pair of directories in-between get turned into
keyword=value pairs in the resulting Query String (on input). The only DB entry
would be a mapping of url-rewriting-made-easy
to
topic=6534
.
All other stuff would be built off of this base entry, on the fly (for output).
The only real difference from osCommerce would be that the topic id here would
not appear in the SEO/SEF URL (as redundant information), but be looked up in
the DB.
There is a question about where to put any real directory information, such as a discussion topic or a product which is not at the root level. Perhaps there could be an “end of Query String parameters” marker that tells (on input) where the real path starts, whether or not it’s actually needed (or redundant information):
Having a minimal amount of information in the DB would help speed up the process (in both directions) and keep the database size reasonable. It would also help whenever you have to manually fix up an entry: one entry rather than dozens and dozens.
All content © copyright 2005 – 2025
by Catskill Technology Services, LLC.
All rights reserved.
Note that Third Party software (whether Open Source or proprietary) on this
site remains under the copyright and license of its owners.
Catskill Technology Services, LLC does not claim copyright over such software.
This page is https://www.catskilltech.com/utils/show.php?link=url-rewriting-made-easy
Search Quotations database.
Last updated Sat, 28 Dec 2024 at 11:29 PM