SunriseXP tutorial

[edit] Overview

Sunrise is used in conjunction with Plucker, a PalmOS offline reader, and Vade Mecum, a Pocket PC equivalent. These programs are similar to a web browser (like the one in which you are reading this piece) except that they are meant for offline viewing—in other words, you download the entire content first and later read the content on your PDA without an internet connection. Those of you who use AvantGo are already familiar with the concept (though I believe AvantGo also lets you surf online as well). If you don’t have either of the readers on your PDA yet, don’t bother reading any further and get yourself a copy of them first:

Plucker for Palm: http://plkr.org/
Vade Mecum for PPC: http://vade-mecum.sourceforge.net/

For Palm folks, all you need from the Plucker website is the “Plucker Viewer” in whatever language you use. As of this writing, I suggest you use version 1.8, the last stable version. I am going to avoid getting into detail about how Plucker works, since that could be a totally separate piece, but the Plucker website has pretty extensive documentation, as does MobileRead itself.

Since I don’t have Wi-fi on my Tungsten E2, Sunrise/Plucker is a great way to provide myself with my favorite web pages, e-books, etc. I run Sunrise nightly with a HotSync, and when I wake up in the morning, I have the day’s newspaper, weather, movie listings, favorite blogs, etc. right there for me to view during that day. There are also pre-formatted e-books available in Plucker format as well. I also have some reference websites in my PDA that permanently stay there for future use. Pretty much anything you can find by surfing online in your browser can be (with a few limitations) parsed with Sunrise XP and viewed later in Plucker.

I have had only exclusive experience with Plucker and no exposure at all to Vade Mecum, so this tutorial will primarily talk only about Plucker/Palm use, but I presume that most of this content applies to either application. (I would appreciate if any Vade Mecum users out there could chime in if there is a notable difference in how it works, or, even better, modify a copy of this tutorial for the benefit of Vade Mecum users.)

Please also note that there have been earlier java-based versions of Sunrise (called “Sunrise Desktop”, which were java-based. The old Sunrise (and JPluckX, an even earlier creation of Laurens’) function similarly, but for most users, Sunrise XP will be the most user-friendly and versatile one.

[edit] System Requirements

Windows XP / 2000 / 2003
Pentium II or better
128Mb RAM for common websites. 256Mb for larger documents.
HotSync Manager 4 or 6 for using the HotSync Conduit. (The conduit is an optional component.)

[edit] What exactly does Sunrise XP do?

Sunrise XP is a desktop PC-based program that downloads web content to your PC, either when you HotSync, and/or on a schedule you designate. By pre-fetching all the content, your viewing experience in the Plucker viewer on your Palm is fast, since there’s no active downloading taking place while you are reading the content. After Sunrise downloads the web content, it then parses out all the unnecessary HTML code to make the files smaller and also to improve the viewing experience on a small PDA screen. You have various ways to control what downloads you want, when you want them and how you want them formatted. You also can filter out unneeded content (more on that later). Sunrise then takes this modified HTML and puts it into a *.pdb (Palm database) file format.

Once the .pdb files are generated, they are waiting on your computer to get loaded onto your PDA, either in RAM or on the expansion card. For Palm users, the best way to get them to your PDA is to configure Sunrise to update as part of the HotSync process, through a HotSync conduit. (Don’t worry about terminology, this will all be explained later). In this way, it functions very much like AvantGo, except there are no limitations on the amount of content you can download per day, there are no obtrusive ads, the content is more compact memory-wise and everything’s open-source!

[edit] Starting Out/Installation

Download the most updated (or stable) version of Sunrise XP. (As of this writing, the most current version is Sunrise XP v2.01.) Save the file sunrisexp-setup.exe (executable file) to your desktop computer (simplest way is to put it on the desktop in Windows). Click on the executable file to start the installer, and go through all the steps to get Sunrise installed on your PC. You should be prompted at one point if you wish for the Sunrise XP stub to be installed on your PDA; you should say yes. A small program will then be loaded into Palm’s Quick Install tool, for installation onto the PDA at the next HotSync. HotSync your PDA to get this “stub” installed, you will need it for Sunrise to send the files to your PDA. Once everything’s fully installed on your PC and Palm, you can delete that .exe installation file if you wish.

[edit] Creating SXLs - Default Settings

Sample SXL (Figure 1)

Sunrise XP uses files generated by the user with directions on what and how to download websites. These files are called SXLs, which stands for Sunrise XP List. When you start Sunrise XP for the first time, you'll be presented with a blank SXL. First, though, Figure 1 shows what a sample SXL looks like with all the information already entered to give you an idea of where we’re going. I have also posted this SXL if you would like to work directly with the sample SXL at some point later.

The SXL can have as many or as few documents as you wish; in this example, there are seventeen listed (seventeen rows). Note that by the term “document”, I’m referring to a specific group of web pages listed in the SXL; depending on the “link depth” that is selected (seventh column), each main “document” could have hundreds of linked web pages below it. As you can see, some of these can get pretty large—a function of downloaded pictures and many many links! Unless you have lots of room in your RAM, I encourage you to load the Plucker documents on your external memory card—more on that later.

Default Document Properties Box (Figure 2)

OK, let’s go back to your blank SXL. The first thing you'll want to do is edit the default properties for new documents. Select "Edit --> Default Properties" and the Default Document Properties Box will come up (Figure 2).

Remember, these are just defaults; ANY of them can be changed later as needed for individual documents on a case-by-case basis. But try to set the defaults to what works best for your needs, as I describe below.

If it’s not already displayed, click the first tab, “Main” at the top. Leave all the blank settings under “Document” and “Source” as-is for now. Those will be changed from document to document. Under “Image Settings” you have choices for the quality of images as viewed in Plucker. As you can imagine, the quality, size and coloration of images will influence the size of the document files. To get an idea of how these different image qualities appear, visit http://plkr.org/gal to see screenshots showing different bit-per-pixel (bbp) resolutions. Naturally, your PDA hardware may limit the quality, especially with an older or lo-res unit. You can select “no images” as a default if you don’t foresee yourself displaying them. If you have a recent model PDA with high resolution (with pixel counts 320 x 320 or 480 x 320), I recommend you select “Thousands of Colors (16 bpp). Otherwise, select what is most appropriate for your hardware. Remember, all these can be changed later.

Max. Size refers to the size of the pictures in the web site. For now, let’s make the default 300 x 300. Also, let’s check the box for “Include Full-Size Alternate Images.” This means that if you do have an image that is too large to be displayed full-size on your screen, you can click on that image in Plucker to see the full size image, after which you can pan Left-Right-Up-Down to see it at full size.

OK, let’s check out the next tab (Output). Let’s assume that we don’t have a default schedule for updates in the first box, since some web sites you might want to download daily, others weekly, others every time, etc. You will want a destination, however, designated in the second box. You’ll note that you have a choice to either set up your Sunrise output to be automatically installed at HotSync, or else to simply be put in a folder somewhere on your PC (in which case you would need to manually load the files onto your PDA). Most of us using the Palm OS are going to want to load our documents directly to the PDA/expansion card during HotSync just as AvantGo does, so let’s assume the HotSync choice is selected. As Laurens’ help files state, Vade Mecum users (Pocket PC) have to output the document to a memory card or use ActiveSync or a third-party tool such as MobSync to transfer the documents to your device.

OK, but now where will we want to put this content? You have to designate a repository for the documents. So under “Destinations,” click “New.” For Palm users, the drop-down to the right of the word HotSync gives you a choice of RAM or expansion card. If you have one, I encourage you to keep stuff on the card unless it’s important/private material and warrants backing up. You have other choices if you pick internal RAM—your documents can be launched in the Palm OS launcher independent of Plucker, and you can have the Plucker documents backed up during HotSync. If you click OK, you should now see your destination in the box.

The next tab at the top, “Feed”, influences how RSS/Atom feeds are handled. Let’s leave this alone for now; you may want to vary this stuff on a case-by-case business. The last tab, “Advanced” can also stay as-is for now as well, we’ll discuss these features later when we configure documents. If you determine later that particular settings fit your needs better, modify those defaults at that time, and with each new Document, your default settings will be automatically entered.

There are also some general program settings on the menu under View --> Preferences. These settings control how the Program Interface works and the proxy server settings (which ideally don’t need to be messed with), and I’ll let you figure out the interface settings yourself—mostly minor tweaks. The only important setting at this stage is the “Maximum Active Updates” setting. You can simultaneously update from 1-5 documents at a time, depending upon your processing resources and bandwidth availability. I have an old clunker of a computer, so I only set it for one at a time. YMMV.

[edit] Selecting appropriate source documents for your SXL

OK, now that we have our defaults, we are finally ready to configure individual documents in this SXL. I suggest you do some “site reconnaissance” first. No, this isn’t a military operation; this is web-site reconnaissance. Laurens has set up Sunrise XP to utilize Microsoft Internet Explorer’s cookies (and cache too) for various reasons. This is important if you wish to download websites that require registration that is retained in a cookie (in my sample SXL, nytimes.com is one of them). Otherwise, such a website will give your Plucker document a “please login” screen, and nothing else. I’m personally partial to Firefox (isn’t everyone?), but use Internet Explorer for site reconnaissance here, to get those cookies registered.

Your “starting webpage” for each document is called the “Source.” Remember that you’re going to start with the Source and linked webpages in Sunrise XP and parse them down to a form that can be viewed in Plucker. MobileRead has many listings for mobile-optimized sites that will work very well as your “Source” and ways to take advantage of them. The links in the item below are just the tip of the iceberg:

http://www.mobileread.com/forums/showthread.php?threadid=6227

Another good option for the Source is to use RSS feeds-- look for a small orange rectangle logo with the letters RSS or XML. MobileRead provides a lot of information on what RSS is. If you click on a link leading to an RSS feed, you won't actually see a webpage, but rather XML code. But that's OK-- the URL for the feed you're looking at is what you want to use in Sunrise.

You’ll generally get very good results viewing mobile versions of websites or RSS in Plucker. RSS and often mobile versions help you avoid unwanted ads, and file sizes will be much smaller. A third possibility is “Printer-Friendly” (often without pictures) or “low bandwidth” versions of websites. The downside is that some of these alternatives may be limited in their offerings and may not include graphics. Experiment—you can even view the Mobile optimized sites right in Internet Explorer, though they’ll look funny on your PC screen.

Site reconnaissance is important while you’re creating the SXL because you need to decide just how much Sunrise XP needs to download to suit your needs. Remember that starting at the Source, Sunrise will download every possible link or graphic that you designate, unless you filter or otherwise restrict what it downloads. For the benefit of both the web host’s bandwidth and your computer’s / PDA’s resources, you want to get what you need, but only what you need, without going overboard. So surf around the website, particularly the Source page. Think about all of the links that are shown on that page, and consider what links are important and what extraneous stuff you shouldn’t download. Are graphics needed to enhance your experience, or are they unnecessary? Will you want to select links, then select links within those links? How can we leave out ads? Are there multiple versions of the same thing (one-page version, printer-friendly version, etc.) or other links (FAQ, “About Us”, “Contact Us”, etc.) that you really don’t care about? Do you want to allow links from outside that website’s domain?

[edit] Document Settings - “Main Tab”

Document Properties (Figure 3)

OK, so now you know a bit about the Source. We want to create a new document in the SXL. So in the main menu of Sunrise, select “Edit --> New Document.” The Document Properties box will appear.

You may already see the URL of the web site you were previously visiting (if you’ve already copied the URL from the browser to the clipboard). If not, cut and paste the URL for the Source from the browser’s URL address line directly into the line that says “URL / File.” You also can have Sunrise XP process a local HTML file on your hard drive for viewing in Plucker by hitting the button with the ellipsis (three dots), which will let you browse for the file

Give your document a name that simply reflects the Source, like “Bill’s Home Page”, “digg.com”, etc. This name is what will show up in Plucker’s Library view. One important feature is the button with the ellipsis to the right of the document name. This gives you the option to put a date stamp in the name of your document, with various formatting choices. If no date stamp is added, every time you update and sync a Plucker document, the new one will have the same name as the old one and overwrite it. (This is fine if you only care to read what’s new day to day, and is what I personally do on all my documents). If you do use the date stamp feature, each time a document is updated, it will have the date appended in the name. If your document name is “digg.com”, and you update it daily, your documents will have names like digg.com 040906, digg.com 041006, digg.com 041106, etc. This way, you can keep multiple copies on your PDA, though you’ll also have to manually delete old copies in Plucker when you no longer want them.

You can categorize your documents if you wish; this simply means that in Plucker, you can view all documents in the main library listing page, or only view one category at a time.

You need to designate a Link Depth, which indicates how many layers down you want to go from the Source—probably the most important variable you can select. On color screens, Plucker allows you to see available links (in blue) and unavailable links (in red) so you can see all the links available to you.

If your Source webpage (and the images on it) is all you want and you’re not going to want to click on any links, then your Link Depth should be set to 0, and all links on the page will show up as “unavailable” in Plucker. If you do want to visit links from that Source page and select Link Depth 1, you will be able to click on links within the Source, but once you are one link in, you will not be able to click on any of the links in THAT page (unless those links have already been downloaded as a different link one layer down in the Source). If you select Link Depth 2, you will be able to click on links two layers in, etc. Note that even if you go two links in, you could get a huge amount of content if the Source and other pages have a lot of links, so generally you’re not going to want to go below Link Depth 2. However, there are several ways to limit downloading a huge number of links. One of the simplest ways is to restrict links by domain, server, or directory. I’m going to use Laurens’ example (found in the Sunrise XP help):

In this example, the Source URL is: http://www.server.com/directory/index.html

[edit] Setting description & examples

Restrict to domain: Download only links in the "server.com" domain:

http://www.server.com/index.html
http://images.server.com/logo.gif

Restrict to server: Download only links on the server "www.server.com":

http://www.server.com/index.html
http://www.server.com/images/logo.gif

Restrict to directory: Download only links in the directory "www.server.com/directory/" or any of its subdirectories:

http://www.server.com/directory/index.html
http://www.server.com/directory/images/logo.gif

Note that images embedded in HTML pages are not subject to the link restriction setting. This behavior is by design, as many sites store their images on another server or domain. Link filters (discussed later) will provide other options to restrict unnecessary content.

You have a checkbox that gives you the option to only update the document if the Source has changed. This saves processing power and bandwidth. Is there really a point in downloading everything again if the Source hasn’t changed? If this setting is checked, Sunrise will make that check and not modify the document if unchanged. That’s why there are two columns in the main SXL window labeled “Last Update” and “Last Modified.” “Last Update” is the last time Sunrise checked for changes. “Last Modified” is the last time there actually was a change made.

The “images” settings were discussed earlier. Modify as appropriate for your Source page and PDA type. Keep in mind that if you go outside of the Source’s domain for links, those pages might be different as far as images, appearance, etc. Unless space or hardware is an issue, I suggest you keep the colors as high as possible.

[edit] Document Settings - “Output Tab”

Documents Output Tab (Figure 4)

The second tab, output, allows you to designate when and where updates will be generated and saved. The check box that says “disable automatic and scheduled updates” means that you want to manually initiate them yourself, or not at all. Perhaps you follow someone’s blog daily and they are going on a three-month absence without postings. You could use this setting to temporarily discontinue the automatic updates, yet keep the document in your SXL for later when the blogger is back (Figure 4).

Documents Schedule (Figure 5)

Scheduling information is put into the top box by clicking “New” and setting a schedule. This is useful, let’s say, if your source website only changes daily or weekly or monthly. You can also set updates for smaller time increments by using the “hourly” settings. I like the New York Times Magazine section, which only is published on Sundays. So my schedule for that is “Every Sunday at 12 AM.” It only gets updated the first time each week that Sunrise XP runs after midnight Sunday morning (Figure 5).

If you leave the schedule box blank, Sunrise XP will always check the Source for updates every time it is run.

The destination information on the output tab should reflect your default setting from before. Change it if necessary for this particular document. (Example, you usually put your documents on the expansion card, but you wish to store this particular document in the internal RAM).

[edit] Document Settings - “Feeds Tab”

Documents Feeds Tab (Figure 6)

The “Feeds” tab really only applies if you are using the RSS/Atom feeds as described previously above; you can skip this tab (and section of the tutorial) for regular websites as these settings have no effect on regular HTML websites. The first two items, logo and blurb, are simply the title/logo of the feed and a brief description of the feed. Do what feels good. The next dropdown selector, “Layout” gives you three choices (Figure 6). If you do use RSS feeds, experiment to see which configuration works best for you on each feed:

Single Page List: This choice basically makes all the separate RSS summary listings into a single “Source” page when you open the document in Plucker—kind of like a simplified home page without any extraneous links, graphics, etc. You then have access to the links to read the full article if you want. This option works well if there isn’t an inordinate number of items in the RSS list and the generated Source page is not unwieldy.

Single Page List Plus Index: This choice is the same as the Single Page except that you have an index at the beginning with link names. If you have a lot more RSS items and/or the descriptions are longer, this index at the beginning lets you read the title of the RSS item on the Source page, then jump forward to the full text lower down on the Source page.

Multiple Page Plus Index: This choice starts you out on a Source page that only contains an index listing each of the items in the feed. From the index, you can navigate to each item description on its own separate page. This choice also adds navigational links at the top and bottom of each page, so you can simply look at the feed items in order and jump to the next one as you wish. This choice is recommended for feeds with full entries that have a lot of content—This is how I read Palm Addict, which often has 60 or 70 items in the feed at any one time, and items are often lengthy.

The other feed settings should be reasonably apparent, and affect how feeds are saved (to keep track of whether content changes).

[edit] Document Settings - “Advanced Tab”

Documents Advanced Tab (Figure 7)

There are a few important settings on the advanced tab that you should consider (Figure 7).

Just like your regular PC browser, Sunrise XP can cache downloaded content, and it’s suggested that you select the box to do so if not already checked. The cache is the same one that Internet Explorer uses. By checking the cache, Sunrise will not download content if you already have the identical file cached on your PC. This will make the overall process faster as well as save bandwidth for the web site’s host. Control of the size of this cache is done through Internet Explorer’s menu (or control panel’s “Internet Options” menu).

As discussed earlier, Sunrise XP can use your Internet Explorer cookies, so check this box if the Source’s website uses cookies to login (such as a newspaper, MobileRead, etc.)

Priority is something I don’t use, but basically you can prioritize Sunrise’s sequence for downloading documents. Unless you prioritize them, Sunrise will update documents in alphabetical order of the document’s name (first item in “Main” tab)

I always have the “Include URL info” box checked. What this means is that if you are in a Plucker document and you want to know the specific URL of what you’re reading for future reference, Plucker will be able to display it. It’s also helpful if you want to view the URL for a website that is beyond your link depth, or has Flash or other content that Plucker cannot display. In either case, Plucker can copy the URL to the PDA’s Memo Pad, a very useful feature. Laurens’ instructions state that the “Include URL info” should not be checked if your Source document is a local file on your hard drive (which also can be processed by Sunrise for viewing on your PDA).

The “Don’t display unresolved links” checkbox is a matter of personal taste, I never check it. As noted earlier, Plucker can display unreachable links (in red) or accessible links (in blue). If you check this box, the unreachable links will not be visible at all, but will just appear like plain text, which might be less distracting for you. I like to know if there’s a link I can’t reach, because I might want to find out the URL for later viewing.

The “Link Filters” is a very important setting. We’ve already had the option to filter out content from different domains, etc. Here, you can designate specific URLs that you don’t want to download, or provide Sunrise with wildcards that filter URLs that have a specific pattern of characters. If you were good with your earlier “site reconnaissance”, you’ll know which links you don’t want and which links you do want. Link filters are processed in the order you present them. The easiest way to illustrate what the filters do is to look at the link filters that I use for all New York Times downloads (Figure 7).

[edit] What each filter in the image does from top to bottom

Documents Link Filter (Figure 8)

Basically, I want to read the New York Times articles, and nothing else. This first filter limits most of my downloads to actual articles. Any web site that has a URL that starts with “http://www.nytimes.com/20*” will be downloaded. (The asterisk indicates a wildcard, basically it represents any possible text.) This is how almost all of the NY Times’ article URLs are set up. For example, the first article on today’s page has the URL: http://www.nytimes.com/2006/04/09/world/asia/09cnd-nepal.html so this filter would allow this link to be downloaded. Any links that don’t follow this convention are probably not content I want. The main section pages (National, Washington, Sports, etc.) do NOT follow this URL convention, but they seemed superfluous since I primarily only want to see content from the Source (front) page, so I intentionally had the filter work as it does (Figure 8).

The next filter is probably not necessary, but I wanted to be sure that all images come through, so by using “*com/images”, any URLs that end with those characters will be included.

I noticed after using Sunrise for awhile that none of the travel articles ever were downloaded. At one time, the New York Times had a different convention for assigning URLs in the travel section, though this no longer seems to be the case. The http://travel2* wildcard enabled me to get all the travel articles.

Many articles in the New York Times website are stretched over multiple pages, and you are provided a link for a “single page version.” You also are often provided a “printer friendly version”. Both of these alternate versions are superfluous, since they have the same content as the primary pages, so I wanted to not download those versions. The URLs of all printer-friendly versions of articles end with the text “pagewanted=print”, so I had the filter ignore those URLs. Similarly, “pagewanted=all” gives you the single-page duplicate, so I set up the wildcard to filter out those URLs as well. I’ll leave it to you to figure out what the filters */fashion/* and *privacy* filter out.

You’ll have to do a little trial-and-error if you have a lot of things you’ll want to filter in or filter out, but once you’re set up (as long as the web site doesn’t change its conventions for URLs), it works great. To create a filter, hit the “new” button and you’ll get the “link filter” box, which gives you various choices. For the pattern, you can put either a “regular expression” (which is a specific URL), or a Wildcard, which uses the asterisk(s) as I did above, which can represent anything. I never change the “Filter all Links” drop-down, but you could have it filter specific HTML tags. (Don’t worry, I barely know what that means myself…) You then need to decide whether you want to only include or only exclude URLs following your wildcard designation.

At this point, if you have different documents that you still want to add to the SXL, go back to Step 6 and add new documents, filling in all the needed data from the four tabs.

Once you have all your documents entered into the SXL, you’re going to want to save the SXL somewhere on your PC. I keep all my SXLs in a folder called …/My Documents/Plucker.

[edit] Rewriting Links

This is an example of how to rewrite links, with a couple of extra tricks thrown in as well. Let's say you want to grab the columns by Chuck Colson from the Christianity Today website. You'll find them at:

http://www.christianitytoday.com/ctmag/features/columns/colson.html

Notice that this page, and each of the article pages, are loaded with ads, unwanted links, etc. On the right, click the link for the printer version. When the printer-friendly box opens, right-click on it and select Properties from Internet Explorer or View Page Info from Firefox. You will find that the URL for the printer-friendly page is:

http://www.christianitytoday.com/global/printer.html?/ctmag/features/columns/colson.html

This is the URL to use in the URL/File field on the Main tab when you create the Sunrise XP document. You'll directly load the printer-friendly main page, eliminating the junk. Now click on the link for a specfic column and you get something like this:

http://www.christianitytoday.com/ct/2006/002/19.144.html

The exact URL depends on the article you clicked. Again open the printer version, right-click and select Properties or View Page Info. The printer-friendly URL is:

http://www.christianitytoday.com/global/printer.html?/ct/2006/002/19.144.html

Now create your Sunrise XP document and create a link filter. Select "Regular Expression" for Match, "Filter all links" for Links, and "Rewrite links matching this pattern" for Filter.

Now, how do you turn the article link into the printable link? Notice that they are identical up to ".com", then the printable link has some extra stuff (/global/printer.html?), then they end identically. If you check several articles, you'll see that the ending part is different for each article. You need to tell Sunrise to stick the extra text in ahead of the article-specific stuff no matter what it is. You start by specifying the part that is identical for all articles, then replace the rest with "(.*)", which essentially says, "match everything here no matter what it is". The result is:

http://www.christianitytoday.com(.*)

but the "." is a special Perl character, so you must put a backslash in front of it when you want it to be taken literally. Now you have:

http://www\.christianitytoday\.com(.*)

That's what goes in the Pattern field for the link filter. Not only will that match the link for any article, but the (.*) part will also grab all of the last part of the text and save it. Later, you can refer to it as "$1"

Now to rewrite the link, you want the part up to ".com", plus the extra stuff you need to insert, followed by the stuff saved as $1. You can write this as:

http://www.christianitytoday.com/global/printer.html?$1

This is what you put in the Rewrite field.

In more complex cases, you may need to use more than one "(.*)". In such a case, when you do the rewrite, the first becomes $1, the second $2 and so on.

Check out this tutorial for more info on Perl Regular Expressions.

An important rule to remember is that, if you use both "include" or "exclude" filters along with "rewrite" rules, Sunrise XP will rewrite the links before it applies the filters. A filter that would work on the original form of a link may not work on the rewritten form. Conversely, a filter that may exclude only the links you don't want if applied to their original form may exclude links you want when applied to the rewritten version.

For example, suppose you want to exclude a link to:

http://www.domain.com/garbage

You could write a rule that excludes all links of the regular expression form:

.*garbage

This will work. Now suppose you also want to rewrite all of the remaining links to add "&printer" to the end. Since you've already excluded the "garbage" link, you might decide to look for:

(.*)

and rewrite this as:

$1&printer

Again, this will work by itself. The problem, however, is that SunriseXP does the rewrite first, then the filter. After the link you don't want is rewritten, it will be:

http://www.domain.com/garbage&printer

This will no longer match:

.*garbage

and the link will not be excluded. You must either be more specific in what links you want to rewrite, so that the garbage link will not get rewritten, or you could change the exclude filter to look for:

.*garbage&printer

[edit] Additional link rewriting examples

[edit] ComputerWorld.com

Like the period, the question mark also is a special character in the Perl programming language. If you want to convert links that contain a question mark, you will need to include a backslash in front of the question mark for Sunrise to take it literally.

For example, let’s say that you want to read ComputerWorld.com’s Handheld RSS feed. http://feeds.computerworld.com/Computerworld/Handhelds/News

The articles that link from this feed are in this format: http://www.computerworld.com/action/article.do?comand=viewArticleBasic&articleId=9001496&source=rss_topic75

(I have intentionally misspelled the word “command” in all instructions and links because spelling it correctly was causing a server problem for some reason. Please add the missing “m” if you wish to use or view any of the links I use in this example.)

You will need to convert this link to the Printer Friendly version to read on your handheld device. The printer links follow a similar format. http://www.computerworld.com/action/article.do?comand=printArticleBasic&articleId=9001496

The difference between the regular browser view and the printer friendly view is the ‘command’ section of the link (comand=view... and comand=print...). Notice that there is also additional information at the end of the regular link that isn’t needed in the printer friendly link (&source=rss_topic75). The topic is different for each RSS feed. For example, if you wanted the ComputerWorld Linux feed, the link would be the same except for the topic, which would be 122. You don’t need to rewrite the exact topic number, rather you use the Perl wildcard (.*) to capture any topic number and then ignore it when you rewrite the link.

To rewrite the ComputerWorld handheld link, you will need to do the following. (Remember to spell “command” correctly in each step.)

Set up the feed as described in Document Settings – “Main Tab”
Create a new Link Filter in the Sunrise advanced tab
Set Pattern to:
http://www\.computerworld\.com/action/article\.do\?comand=view(.*)&source(.*)
- Each period and question mark is preceded by a back slash so that Perl will take the punctuation marks literally.
- The first wildcard appears after “comand=view.” The second appears after “&source.” (I did not write the entire ending (&source=rss_topic75) because some ComputerWorld feeds do not include the word topic and I want this code to work for any feed I want to download.) The first wildcard will capture any link information between “comand=view” and “&source.” In this example, that information is “ArticleBasic&articleId=9001496.” The second wildcard will capture everything that follows “&source.” In this example, that information is “=rss_topic75”
Set Match to “Regular Expression”
Set Links to “Filter <a> anchor links”
Set Filter to “Rewrite links matching this pattern”
Set Rewrite to:
http://www.computerworld.com/action/article.do?comand=print$1
- Notice that backslashes are not required before periods or question marks in the Rewrite section. This is because the Pattern field is being read as a Perl expression and the Rewrite field is simply an output field. What you type in will be displayed literally.
- The ending character “$1” adds the information from the first wildcard to the information that precedes it. In this example, the completed link will be:
  http://www.computerworld.com/action/article.do?comand=printArticleBasic&articleId=9001496
- The second wildcard (that followed “&source” in the Pattern field) is not included in the Rewrite field in this example because we didn’t want the “&source” and everything that followed to be included in the rewritten link. Basically, Sunrise captured the information but did not use it, which effectively deleted it.

[edit] More filtering for ComputerWorld.com

As with any feed, you must check to see if there are some links that you don’t want to download to your handheld. For example, each feed from ComputerWorld contain two links that we do not want to download because they will needlessly increase the size of the document.

A link for more news on the topic which leads to the topic’s section front page. This page does not have a printer friendly version.
A link for special live Webcasts that do not have any text to read or printer friendly versions.

Fortunately, getting rid of these items is simple. In the advanced tab, you will need to do the following.

Make a new Link Filter
Set Pattern to: *index.jsp
- Each “More News” page ends in “index.jps.” No articles end with it so it is a safe filter to use.
Set Match to “Wildcard”
Set Links to “Filter all links”
Set Filter to “Exclude links matching this pattern”
Leave Rewrite blank. (You shouldn’t be able to edit it when Filter is not set to Rewrite.)

To get rid of Webcasts, you’ll need to repeat the above steps but set Pattern to: http://www.computerworld.com/action/webcast*

Finally, as with some other websites, ComputerWorld includes ads on the printed versions of their articles. To get rid of these, repeat the above steps but set Pattern to: http://ad.doubleclick.net* (If a printed page of another website includes advertisements, you can find the URLs for them by looking at the printer friendly page’s source. Instructions for doing this are here.

The code in the above example works for any ComputerWorld feed and can be easily modified for other feeds.

(I have intentionally misspelled the word “command” in all instructions and links because spelling it correctly was causing a server problem for some reason. Please add the missing “m” if you wish to use or view any of the links I use in this example.)

[edit] MSNBC

Some links only need to have some additional information appended to the end to make them printer friendly links. These are very easy to set up. Let’s say that you want to add MSNBC’s Technology & Science Headlines to your document list.

The articles that link from this feed are in this format:
http://www.msnbc.msn.com/id/13601613/

MSNBC Printer friendly documents are in this format:
http://www.msnbc.msn.com/id/13601613/print/1/displaymode/1098/

The only difference between the two format is the addition of “print/1/displaymode/1098/” to the end of the link. You will need to create a link filter that will rewrite the original link with the printer information.

Set up the feed as described in Document Settings – “Main Tab”
Create a new Link Filter in the Sunrise advanced tab
Set Pattern to:
http://www\.msnbc(.*)
- As always in the Pattern field, each period is preceded by a back slash so that Perl will take the punctuation mark literally.
- All articles on MSNBC begin this way. Some begin http://www.msnbc.msn so placing the wildcard after msnbc ensures that each article will be downloaded correctly.
- In this example, the wildcard will catch the following information:
  “.msn.com/id/13601613/”
Set Match to “Regular Expression”
Set Links to “Filter <a> anchor links”
Set Filter to “Rewrite links matching this pattern”
Set Rewrite to:
http://www.msnbc$1print/1/displaymode/1098/
- Notice that backslashes are not required before periods or question marks in the Rewrite section. This is because the Pattern field is being read as a Perl expression and the Rewrite field is simply an output field. What you type in will be displayed literally.
- “$1” can be added anywhere that you need the captured wildcard information to be inserted. In this example, the completed link will be:
  http://www.msnbc.msn.com/id/13601613/print/1/displaymode/1098/

[edit] The HotSync Conduit

NOTE: the HotSync conduit currently does not work on some Windows configurations and/or Palm OS devices. (See this discussion.) If you encounter issues, simply uninstall Sunrise XP and reinstall it without the HotSync conduit. The standard Install and Install To Card conduits will then be used instead to transfer documents to your handheld. You will, however, lose the ability to update documents during HotSync.

Custom HotSync conduits (Figure 9)

OK, so once you’ve added all the documents you wish to your SXL, it’s time to get Sunrise working. The first thing you should do is check your HotSync conduit. Your HotSync Manager should already be running (with a HotSync icon in the system tray at the lower right corner of your screen), start it if it isn’t running. Left- or right-click on that icon, and you’ll get a pop-up menu. Select “Custom” (Figure 9).

This will display a list of the HotSync “conduits”, which are the settings that dictate how data for different applications is synced between PC and PDA. Highlight the Sunrise XP conduit and click on the “change” button.

Sunrise XP conduit settings (Figure 10)

You’ll now see a box with the heading “Sunrise XP Conduit” (Figure 10). If not already there, click on the “Update” tab. You will need to designate any SXL that is to be regularly updated during a HotSync. Click “Add”, and then browse over to the location where you have your SXLs saved (…/My Documents/Plucker in my case). Select the file and click OK. Make sure to check the box next to the SXL. (Obviously unchecking this box will disable updating of this SXL if you ever wanted to do that).

You don’t need to worry about the “view files” setting below. It just shows any documents that have been already generated by Sunrise and are awaiting a HotSync to load them.

Click on the “Action” Tab. The two choices at the top are whether you want the conduit to be active or not, which we do. If you wanted to disable this conduit, you would select “do nothing.” Select whether you want the active/inactive conduit as the default setting. By designating or not designating your action as a default, you’ll change the conduit for the next HotSync, but have it revert back to the default on the following HotSync.

Sunrise XP needs a small “stub” on the PDA. This should have been installed as part of the basic installation process. If you’re having problems with the conduit not working, click on the button to have it re-install the stub at the next HotSync.

Click on the OK button to close out of the conduit settings. The list of conduits should now show that Sunrise XP is enabled (Desktop PC overwrites handheld). If the conduit is disabled, it will read “Do Nothing.”

[edit] Update during HotSync

Sunrise XP conduit update (Figure 11)

Finally, we’ve created an SXL and set up the conduit! The next time you HotSync, if everything proceeds correctly, when it reaches the stage where Sunrise XP updates your documents, you should see a box pop up that documents the progress you have made in downloading the web sites in your SXL. After Sunrise has finished, the rest of the HotSync process will continue, and once you’re done with the HotSync, you can now open Plucker on your PDA and enjoy your documents (Figure 11).

By the way, you can manually use Sunrise XP to update documents due for updating independently of the HotSync Process using the menus, on-screen buttons, or by right clicking on a document. You have various choices on how to update your documents.

[edit] AutoUpdate

As an alternative to updating documents during HotSync, you can also let Sunrise XP update documents automatically when they become due. (For this to happen, the documents must be scheduled explicitly.) To enable AutoUpdate, click the icon on the toolbar or select "Update -> AutoUpdate" from the main menu.

You do not have to leave Sunrise XP running for AutoUpdate to work, as it creates a Windows Task Scheduler job automatically upon exit. (If you don't want this to happen, you can disable Windows Task Scheduler integration in Preferences.)

[edit] Troubleshooting or Investigating with the Update Report

One nice feature of Sunrise XP is that if you’re having problems (or simply curious to see how well your site reconnaissance worked) the program provides a log of what has been downloaded for each document in the SXL. This log isn’t generated until after you’ve actually ran Sunrise XP and updated the document. To access this log, highlight the document in which you are interested in the main Sunrise XP screen and on the menu, go to Update --> Open Report. An html-formatted report will come up, and you can see which links or images Sunrise downloaded (or attempted to download) (see this example).

[edit] Showcase

The Showcase (download) is a list of predefined document settings for newsfeeds from BBC News, New York Times and several other sites. It contains link rewriting rules for fetching printer-friendly versions of articles.

To copy the documents settings to your own SXL:

Open your own SXL.
If you haven't done so already, set up default properties for new documents using "Edit -> Default Properties".
Open the "showcase.sxl" from Showcase ZIP file.
Pick the documents of interest and select "Edit -> Copy" to copy them to the clipboard.
Switch back to your own SXL and select "Edit -> Paste Special". This will open a secondary dialog.
In the Paste Special dialog, make sure "Image Settings" and "Destinations" are unchecked. (These settings will then be taken from the default properties that you configured in step 2.)
Click OK to dismiss the dialog and paste the documents in your own
Update the documents to verify that they work as intended.

[edit] More Information

SunriseXP_reference contains detailed information on each setting, including the application preferences (which this tutorial does not cover).

[edit] Credits

Originally written by doogie68 and first posted on the MobileRead forums.
Link-rewriting part contributed by DTM.
Minor edits and additional material by Laurens M. Fridael
Additional link-rewriting examples written by Chris.