Seaside and Apache httpd’s mod_proxy: how to get the original URL


[Update: be sure to read the comments. Philippe has added some very valuable information there]

I’m in the process of putting a VA Smalltalk based Seaside  application online on a Linux Server. The server serves multiple applications and static web sites, so I had to use mod_proxy to forward a certain url to the Smalltalk image.

So my httpd.conf for this virtual host looks like this (I left out some stuff so we can concentrate on the topic here):

<VirtualHost myserver.com:80>
# set server name
ProxyPreserveHost On
ServerName myserver.com
# rewrite incoming requests
RewriteEngine On
RewriteRule ^/files/(.*) http://localhost:8080/files/$1 [proxy,last]
RewriteRule ^/(.*)$ http://localhost:8080/MyApplication/$1 [proxy,last]
</VirtualHost>

I read on the Seaside mailing list that ProxyPreserveHost On makes Apache hand over the URL that the user sees in her browser to Seaside. And somehow I expected this URL to be available in the application’s url by sending

self application url.

It turns out this isn’t the case.  All there is is /MyApplication?_s=…

I was somehow expecting that one of WAApplications variables (basicUrl, serverHostname, serverPath, serverPort and whatever) would be filled with any of the parameters. But looking at WAServerAdaptor>>#requestFor: it gets clear that this cannot be the case. At least not on VA ST and also not in Pharo 1.3 wth Seaside 3.06. In the end this means that using mod_proxy does not change the contents of the url that is handed into the Smalltalk image at all, it just makes sure all requests are forwarded to the Smalltalk image.

At our Smalltalk Inspect Fest I got the the tip (thanks, Norbert!) to use

self requestContext request url.

Which  returns the very same url.

Debugging around a little (again with some help from Norbert), I found out that what I was looking for is in the ‘referer’ header of the HTTP-Request. Again, this has nothing to do with mod_proxy, the requesting url is also in the referef header on a local machine where there is not even Apache installed.
So here’s what I’m using now to find out which URL the user entered in their browser:

(WAUrl absolute: self requestContext request referer) withoutQuery

What’s it good for?

IN many cases it is not really necessary to know what URL a user requested, as long as the proxy forwards requests to the Smalltalk image, so many Seaside applications may not need this info anyways. In my case I needed it to send out a link by mail which a user can click to get to a certain page and confirm an action. I wanted to be able to use some functionality within the image that does not need to be configured (a possible point of failure) for the development, test and production environments.

3 thoughts on “Seaside and Apache httpd’s mod_proxy: how to get the original URL

  1. This would better be discussed on the Seaside mailing list.

    > My initial understanding of what mod_proxy would do when
    > ProxyPreserveHost is turned On was that it would forward the HTTP
    > Request to the desired url/port but still preserve the hostname from
    > the original request. Obviously, that is not the case.

    That’s what should happen. At least it should be available in the X-Forwarded-Host header.

    > …as far as I understand it, this would mean I cannot use mod_proxy for
    > load balancing in the hopefully not so unlikely case the application
    > finds many users.

    Why? How do you do load balancing?

    If you set #defaultName to ‘MyApplication’ on the default dispatcher and you down to:

    ProxyPass / http://localhost:8080/

  2. Hi Philippe,

    first of all, thanks a lot for your comment. It was very helpful in understanding what is going on.

    >Seaside can not know the URL of the request because you instruct >mod_proxy to change it from /foo to /foo/MyApplication. mod_proxy >doesn’t communicate the original URL to the backend server.

    That is what I had to find out as well. I must admit I may have misunderstood Lukas’ comment about ProxyPreserveHost in the linked forum thread. But I also misunderstood the Apache docs as well.

    >self application url

    >That’s the URL under which the application is registered.

    Yes, and that is what I see.

    >self requestContext request url
    >That’s the URL of the current request as passed to Seaside by mod_proxy.

    My initial understanding of what mod_proxy would do when ProxyPreserveHost is turned On was that it would forward the HTTP Request to the desired url/port but still preserve the hostname from the original request. Obviously, that is not the case.

    >self requestContext request referer
    >That’s the URL of the last request the user agent made. This is almost >certainly not what you want.

    hmm. I made a few extra tests with my application. Your argument is logical and the referer header sure is not there for the purpose I am using it for. But it seems it always contains the url I am looking for. Maybe that is not the case if I used it on the very first called component of my application, but since that is the login screen, and I need the information in a callback of a component that is a few show:’s away from the login screen, the referer in that case always seems to be my start/login page. Even if I browse to, say, ibm.com in the middle of a session and use the back button, the referer is correct.

    So, yes, the referer is not guaranteed to cantain the base url of an application, but it seems it is good enough for what I need. I’ve tested Mozilla and Chrome so far and it seems I can follow that path.

    >At this point you have several options:

    None of these watered my mouth so far😉

    > – don’t tell mod_proxy to change the request URL and either remove the >application name for the path or accept it

    …as far as I understand it, this would mean I cannot use mod_proxy for load balancing in the hopefully not so unlikely case the application finds many users. And, what you are describing is exactly what I thought ProxyPreserveHost is for: don’t change the URL, but redirect the request. But maybe my problem is that ProxyPreserveHost doesn’t work in combination with RewriteRules, but only with ProxyPassReverse. I remember having different problems with that directive, however….

    > – manually remove MyApplication from the URL when generating the mail >link (be sure to #copy first)

    Well, since the url doesn’t contain the host name, this doesn’t really help. I would have to set the WAApplications’ settings then, using an ini-file or such and I’d like to avoid this.
    > – use an other protocol than HTTP for reverse proxying

  3. Seaside can not know the URL of the request because you instruct mod_proxy to change it from /foo to /foo/MyApplication. mod_proxy doesn’t communicate the original URL to the backend server.

    self application url

    That’s the URL under which the application is registered. It should not have a session key. This is used for the URLs that Seaside generates. The application can not have any state that depends on the request, otherwise you’d end up with horrible threading issues.

    self requestContext request url

    That’s the URL of the current request as passed to Seaside by mod_proxy.

    self requestContext request referer

    That’s the URL of the last request the user agent made. This is almost certainly not what you want.

    At this point you have several options:
    – don’t tell mod_proxy to change the request URL and either remove the application name for the path or accept it
    – manually remove MyApplication from the URL when generating the mail link (be sure to #copy first)
    – use an other protocol than HTTP for reverse proxying

Comments are closed.