Google vs. Book Monopolists

In an article on Google’s efforts to scan books USA Today highlights part of what’s wrong with the US copyright law:

Richard Hull, executive director of the Text and Academic Authors Association, called Google’s approach backwards. Publishers shouldn’t have to bear the burden of record-keeping, agreed Sanfilippo, the Penn State press’s marketing and sales director.

“We’re not aware of everything we’ve published,” Sanfilippo said. “Back in the 50s, 60s and 70s, there were no electronic files for those books.”

Memo to Richard Hull: Google is forwards, you are backwards. Why do you care if Google makes a few sentences from books you’ve never heard of available? If you’re really all that concerned with keeping control of those books, you would have kept track of them. In a sane system you have to actually know what work you’re protecting the copyright of. You can’t just say “well, I don’t remember my copyright work and can’t show any infringement, but dammit they better stop because they might infringe on my copyright!”

Our current copyright law gives a monopoly to people who don’t even want one thanks to the US Copyright Act of 1976. Why are we giving monopolies of human knowledge to absentee landlords who can’t be bothered to keep track of their work, but who freak out at the possibility that their long-forgotten work might be used without their permission?

Google Maps thoughts

After the bajillionth Google Maps site it hit me: Google Maps is at its core a nifty UI widget for a common type of data. The OS doesn’t provide a widget for dealing with location data, and the web browser as a subset of OS widgets certainly doesn’t (hello combo box!). That’s like 10% of what makes Google Maps so cool, which is a lot considering how cool it is. People with location data are scrambling to put their data in a format that’s usable. Google Maps is literally changing the way people think about place.

Your homework assignment tonight is to think about what common types of data people have, and what kind of UI widget could be created specifically for browsing that data. Then create a multi-billion dollar business around your new web service. Bonus points for sucking up to my sympathetic nature towards the Semantic Web.

Always on Google

Google SMSMy girlfriend and I are sitting in a semi-darkened theater on Thursday night. The ticket says 9:45 PM, it was now that time and we had just finished watching 10 minutes of commercials. Instead of the movie, we get an in-depth ad for Ron Howard’s new movie and a great big view of Ron Howard. He is not looking good, probably because they didn’t bother with makeup for what is destined to become a DVD extra.

We can’t get over how bad he looks, like he’s a junkie one fix away from hitting bottom. I say it must be age, figuring that Opie was in black and white so it must have been quite a while ago. She says he isn’t that old. Normally this would be an impasse and we would move on.

With no interest in the Ron Howard featurette, I send an SMS saying “Ron howard age” to 46645 and get an immediate response saying

Q&A: Ron Howard Date of Birth:1 March 1954 Source www.who2.com/ronhoward.html

And yes, my phone is on silent.

Small things like that are what Howard Rheingold has been writing about for a while. If I can find out something as arbitrary as Ron Howard’s birth date from somewhere as disconnected as a movie theater seat, our entire relationship to information is changing. Kudos to Google for succeeding to “organize the world’s information and make it universally accessible and useful.

Google Web Accelerator

googleaccelerator.gifSo Google Web Accelerator. Like I said in del.icio.us, proxy servers can be a huge pain for web developers, putting Google’s name on one is inviting a shitstorm. Putting the web development stuff off for a moment, let’s talk about why Google is providing this.

First off, it’s helpful. There’s no market for tools that fuck users over, and the Google culture seems to be built around helping users. OK, but what’s in it for Google aside from being helpful? It doesn’t center around becoming more privacy invasive, although I’m sure their advertising division will use it to sell you herpes medication.

Tristan Louis explains half of the benefit: Google will use the data to find information it hasn’t yet. You should really read that if you haven’t already, Tristan has covered every inch of that argument. Summarized for people who read it and forgot (you’re not too lazy to click that link, are you? Of course not) Google will use the URLs to find pages that its spiders missed or haven’t gotten to yet so that there are more results with newer information.

The other half is that PageRank is in trouble. When it first debuted, it brought order to the web. But it’s based on voting, and voting can be abused. Spammers are waging war on PageRank and Google has to respond manually. Look at how Syndic8’s link farming got shut down: Andy Baio had to make a blog post before they lost their PageRank. That is not a scalable solution and Google is a company that scales up with computers, not employees.

The way PageRank works is that it counts each link as a vote for a page, and pages with higher PageRanks get more votes. This is used to organize the web, Google infers which pages are most popular by the number of links to that page.

What the Web Accelerator and the Toolbar do when they report what page you’re on is give Google traffic information almost as good as the web server’s logs—sometimes better. This lets Google know how popular a page is, they don’t have to infer it from incoming links. They can then use that information to devalue sites that link farm as well as promote sites that are highly visited but not highly linked.

Google is watching you browse and using the information to organize the web better. I’m not going to tell you how to feel about that.

Now, for pages showing up on other people’s connections, that’s another part of HTTP. You need to look at Section 10.9 of RFC 2616, which explains the cache-control header. Personally, I hate dealing with caching proxy servers because they usually suck at following the standard, but if you build your app to standards then you can blame the proxy server. You’ll still probably have to come up with a workaround, but at least you have the moral high ground. In this case, the standard is to send the header Cache-Control: private to any user that is signed in. Google should respect this header; if it doesn’t then you shouldn’t feel so bad about Google not hiring you.

And then there’s also the controversy over the proxy breaking web applications. Google Web Accelerator doesn’t break web applications. If it’s following a link that deletes an item in your database, the application is broken. The application was built by a developer who doesn’t understand the difference between GET and POST. That would be like someone who doesn’t know the difference between RAM and a hard disk building desktop applications.

While web developers should read Section 9 of RFC 2616 to understand the difference between safe and idempotent methods, I’ll summarize: If a form or a link changes anything on the server, it should be called with POST. If not, GET.

I know a bunch of designers want text links that make changes because submit buttons are ugly. Unless the W3C adds a way to create post links in XHTML (which I haven’t seen an argument against adding) there are at least two ways to POST from a text link, neither really good but both better than the alternative of having an application break because a browser behaves correctly.

The first way is to make the links do a javascript submit. <a href="#" onclick="this.form.submit()">Delete</a> This sucks because there’s no way for a browser that doesn’t support javascript to use it. Think that doesn’t matter? Tastemakers are using phones like the Sidekick to browse the web—which don’t support javascript. If they can’t use your web app, they’re not going to convince their friends to.

The second way is what Instiki decided to do. Instiki used to have plain GET links that would cause a page to revert to an earlier version. When a search engine would index the site, it would follow those links and cause the page to roll back. The solution was to link the plain text link to a page with a form that would perform the rollback. And can you guess the method attribute of that form element? It was POST. And now you know the rest of the story.

Cool stuff I noticed about Google Maps

If you haven’t been to Google Maps yet,

  1. Get yourself some better RSS feeds
  2. Go to http://maps.google.com/

Here are some cool things I’ve noticed about Google Maps. I think this is going to be one of those posts I update a lot in a day.

  • The URLs are fairly clean. You can look up an address from your location bar by putting “http://maps.google.com/maps?q=” before it. For example: http://maps.google.com/maps?q=742 Evergreen Terrace, Springfield
    You can also specify the latitude and longitude by passing ll=$LAT,$LON where $LAT and $LON are decimals. That means you can make a bookmarklet that would show you the location of a blog based on it’s GeoURL. In fact, I did just that: Map GeoURL
  • They use semi-transparent PNGs for routes over street maps (do they get this to work correctly in IE?). That means they only have to dynamically generate route images, all the map images can be static.
    Also, they’re using XSL on the client side, from a brief glance it looks like app uses XMLHttpRequest to query the map server, then rendering the result with XSL (but I could be completely wrong). Update: as simple as possible, but no simpler has an in depth look at the mechanics.
  • Google Local searches are based on what’s on the map by default. For instance, search for your address, clear the search box and search for pizza. Since the map is centered on your address, it will search around you. If you double click somewhere on the map to recenter and search again, it will use the new map center.

  • You can use the arrow keys on your keyboard to move around the map. + and - zoom.

  • On the driving directions, you can click on the step number to see a cool zoom of what you need to do for your turn.

  • Google owns Keyhole, who make a really cool product with pictures of the world. Hopefully those pictures will get integrated real soon.

  • Ted Mielczarek has written a Firefox extension to load the current Google map into Keyhole. I don’t have Keyhole (there’s a free demo, but not for the Mac) so I haven’t tested it and can’t vouch for it.

Google’s comment spam problem

Comment spam is a problem, but it’s not my problem. I get hit with comment spam, sure, but I’m only to blame in the sense that I have a weblog. The comment spammers are to blame, but they’re not alone in culpability. The one character missing is the one people are first to defend and last to blame &emdash; Google.
Google built a hell of search engine with PageRank, and were rewarded with unbridled admiration and one of the world’s most liked brands. But PageRank isn’t perfect, and people have been gaming it for as long as its been around. Google actively tweaks their algorithm all the time to deal with spammers.
Originally spammers would post their own links in their own domains. While it would be annoying in search requests, that’s the only time you would encounter them. Google was able to combat that by avoiding junk sites, so the spammers moved on to adding links to weblogs.