Posts Tagged ‘hack’

Converting a data table on the web to an autocomplete translator with YQL and YUI

Monday, August 31st, 2009

During the Summer of Widgets hack event last weekend, Tomas Caspers, Nina Wieland and Jens Grochdreis had the idea of creating a translation tool to translate from the local Cologne accent to German and back.

For this, they found a pretty impressive data source on the web, namely this web site by Reinhard Kaaden. The task was now to turn this into a fancy interface to make it easy for people to enter a “Kölsch” term and get the German equivalent and vice versa. For this, I proposed YQL und YUI and here is a step-by-step explanation of how you can do it.

You can see the final outcome here: Deutsch-Kölsch übersetzer
or by clicking the screenshot:

Deutsch-Koelsch Uebersetzer by  you.

Step 1: Retrieve and convert the data

A very easy way to get data from the web is using YQL. In order to get the whole HTML of the source page all we had to do is select * from html where url='http://www.magicvillage.de/~reinhard_kaaden/d-k.html'. That gave us the whole data though and we only wanted to get the content of the tables.

Using Firebug and looking up some XPATH we came up with the following statement that would give us the language pairs as German-Koelsch inside paragraphs: //table[1]/tr/td/p[not(a)]. The not(a) statement is needed to filter out the A-Z navigation table cells. We chose JSON as the output format in YQL and dktrans as the callback function name.

All in all this gave us a URL that would load the data we wanted and send it to the function dktrans once it has been pulled:

<script type="text/ javascript">
var dktransdata = {};
function dktrans(o){
console.log(o); // we have data!
}
</script>
<script type="text/ javascript" src="http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22http%3A%2F%2Fwww.magicvillage.de%2F~reinhard_kaaden%2Fd-k.html%22%20and%20xpath%3D%22%2F%2Ftable[1]%2Ftr%2Ftd%2Fp[not%28a%29]%22&format=json&env=http%3A%2F%2Fdktransdatatables.org%2Falltables.env&callback=dktrans">

I m using a global object called dktransdata to store all the information we will need. This is necessary as the YUI Autocomplete needs global pointers to the data sources and autocomplete instances.

Step 2: Filter and convert the data

The next step was to turn the data we get back from YQL into usable data for the interface we planned to build. What we needed was an array of all the German terms and one of the Cologne terms for the autocomplete control and a way to match which one means the other. As the data set returned from YQL is an array with German/Cologne collections this was as easy as looping over the array with an increase of 2, seed two arrays and two hashmaps:

var t = o.query.results.p;
dktransdata.koelsch = [];
dktransdata.deutsch = [];
dktransdata.dk = {};
dktransdata.kd = {};
for(var i=0;i<t.length;i+=2){
dktransdata.koelsch.push(t[i+1]);
dktransdata.deutsch.push(t[i]);
dktransdata.dk[t[i+1]] = t[i];
dktransdata.kd[t[i]] = t[i+1];
}

After this has executed I have two arrays, deutsch and koelsch with the terms in each language and two objects dk and kd which are the map between the two languages. So if I now read out dktransdata.dk['darunter'] the value is drunger and vice versa. This saves us from reading the arrays repeatedly to find the right terms.

Step 3: Create the HTML for the autocomplete controls

The YUI Autocomplete needs two things to turn an input element into an autocomplete field. The field and a div to show the results in. Thus, all we needed was the following:

<div id="deutsch">
<h2>Deutsch</h2>
<input id="deutschinput" type="text">
<div id="deutschoutput"></div>
</div>

<div id="koelsch">
<h2>K&ouml;lsch</h2>
<input id="koelschinput" type="text">
<div id="koelschoutput"></div>
</div>

Step 4: Include the YUI and instantiate the autocomplete controls

The next step was to get the right YUI files to convert the input elements into autocomplete controls. The easiest way to do that is to use the configurator:

Serving YUI Files from Yahoo! Servers by  you.

Using this, we get a stub that will load all the YUI components we need and we can put the code we want to execute once that is done inside the onSuccess handler:

<script type="text/javascript" src="http://yui.yahooapis.com/2.7.0/build/yuiloader/yuiloader-min.js"></script>
<script type="text/javascript" charset="utf-8">
var loader = new YAHOO.util.YUILoader({
base: "",
require: ["autocomplete"],
loadOptional:false,
combine: true,
filter: "MIN",
allowRollup: true,
onSuccess: function() {
// code to use YUI goes here
}
});
loader.insert();
</script>

All that had to go in there to create the Autocomplete controls was more or less 100% copied from the simple Autocomplete example on the YUI site.
First thing is to get some handlers to the input fields I want to populate with the translation data:

var di = YAHOO.util.Dom.get('deutschinput');
var ci = YAHOO.util.Dom.get('koelschinput');

Then you need to instantiate the data source for the autocomplete and give it the language array. As a responseSchema you can define a field called term:

dktransdata.cologneDS = new YAHOO.util.LocalDataSource(
dktransdata.koelsch
);
dktransdata.cologneDS.responseSchema = {fields:['term']};

Next you need to instantiate the AutoComplete widget. This one gets three parameters: the input element, the output container and the data source. You can set useShadow to get a small dropshadow on the container:

dktransdata.cologneAC = new YAHOO.widget.AutoComplete(
'koelschinput','koelschoutput',dktransdata.cologneDS
);
dktransdata.cologneAC.useShadow = true;

This turns the input of the Cologne language into an Autocomplete, but it doesn’t yet populate the other field. For this we need to subscribe to the itemSelectEvent of the AutoComplete widget. The event handler of that event gets a few parameters, the text content of the chosen element is the first element of the third element in the second parameter (this is explained in detail on the YUI site). All you need to do is set the value of the other field to the corresponding element of the translation maps we defined:

dktransdata.cologneAC.itemSelectEvent.subscribe(cologneHandler);
function cologneHandler(s,a){
di.value = dktransdata.dk[a[2][0]];
}

All that is left is to do the same for the German to Cologne field:


dktransdata.germanDS = new YAHOO.util.LocalDataSource(
dktransdata.deutsch
);
dktransdata.germanDS.responseSchema = {fields:['term']};
dktransdata.germanAC = new YAHOO.widget.AutoComplete(
'deutschinput','deutschoutput',dktransdata.germanDS
);
dktransdata.germanAC.useShadow = true;
dktransdata.germanAC.itemSelectEvent.subscribe(germanHandler);
function germanHandler(s,a){
ci.value = dktransdata.kd[a[2][0]];
}

Step 5:Putting it all together

You can see the full source of the translation tool on GitHub and can download it there, too.
Of course we are not really finished here as this only works in JavaScript environments. As the translator was meant to be a widget though, this was not an issue. That the autocomplete does not seem to work on mobiles is one, though :).

Making this work without JavaScript would be pretty easy, too. As the data is returned in JSON we can also use this in PHP and write a simple form script If wanted, I can do that later.

TTMMHTM: Evangelist Handbook, Billboard charts API, collaborative editing, IE6 bashing, pretty JSON, fancy fast food and terrible bugs.

Friday, July 24th, 2009

Things that made me happy this morning

TTMMHTM: Braille body mods, Tesco hack, Placemaker talk video, old superheroes, steampunk, accessible Opera and cat control

Tuesday, July 14th, 2009

Things that made me happy this morning:

Geo this! Geolocate WordPress posts with Greasemonkey and Yahoo Placemaker

Monday, June 22nd, 2009

Geolocating content on the web is a great idea. By embedding latitude and longitude and real place names in your document you allow data mining for location or easy display on a map.

The problem up to now was that it is quite a job to find out the correct geo information from a text or a document and it is quite a pain to enter the information by hand.

Yahoo Placemaker is a web service that helps you with that – you give it some text or a document URL and it returns you all the things it found in there that resemble a geographical location back. The issue with doing that on a live site is that you slow down your site immensely as you need to look up every time.

The more logical place to do the lookup with Placemaker is when you edit your document. I thought this would be cool to have for this WordPress install here and wrote a small GreaseMonkey script that injects a new “Geo this!” button in the main WP form:

Geo this - button by  you.

When I hit the button the script does an Ajax request using the Placemaker open YQL table to get the information for the currently edited text.

Once it found the information it adds it at the end of the document as a GEO microformat. Each found entry starts with a comment that tells you what Placemaker matched and considered a geographical location. As it is not infallible this makes it easy for you to delete wrong entries.

Geo this - added microformats by  you.

Try it out yourself:

This is pretty much rough and ready and I’d be happy for feedback how to improve it.

Postcode from latitude and longitude or even IP – fun with Geo APIs and YQL

Tuesday, June 9th, 2009

One of the more complex things about GeoFill was to get postcode information from an IP. However with a collection of APIs and a collated YQL statement even this was possible.

The first thing I needed to get was the IP of the user. This is done with the GeoIP API based on the GeoLite API from MaxMind. This is available as an open table in YQL and can be used thus:

select * from ip.location where ip=""

Try the lookup in the console or check the lookup result

Response": {
"Ip": "216.39.58.17",
"Status": "OK",
"CountryCode": "US",
"CountryName": "United States",
"RegionCode": "06",
"RegionName": "California",
"City": "Sunnyvale",
"ZipPostalCode": "94089",
"Latitude": "37.4249",
"Longitude": "-122.007",
"Gmtoffset": "-8.0",
"Dstoffset": "-7.0"
}

This gives us a lot of information. What’s really important here is latitude and longitude, as this can be used in the flickr.places API to get a where on earth ID which is a much more defined identifier:

select * from flickr.places where (lat,lon) in (
select Latitude,Longitude from ip.location where ip=""
)

Try the flickr places call in the console or check the flickr result

"places": {
"accuracy": "16",
"latitude": "37.4249",
"longitude": "-122.007",
"total": "1",
"place": {
"latitude": "37.371",
"longitude": "-122.038",
"name": "Sunnyvale, California, United States",
"place_id": "P_ls_fybBJwdHP8t",
"place_type": "locality",
"place_type_id": "7",
"place_url": "/United+States/California/Sunnyvale",
"timezone": "America/Los_Angeles",
"woeid": "2502265"
}
}
Here the interesting part is the woeid which we can use to dig deeper into geo.places:
select * from geo.places where woeid in (
select place.woeid from flickr.places where (lat,lon) in (
select Latitude,Longitude from ip.location where ip=""
)
)

Try the geo places call in the console or check the geo places result

The result is all the information you’d ever want.

"place": {
"lang": "en-US",
"xmlns": "http://where.yahooapis.com/v1/schema.rng",
"yahoo": "http://www.yahooapis.com/v1/base.rng",
"uri": "http://where.yahooapis.com/v1/place/28751237",
"woeid": "28751237",
"placeTypeName": {
"code": "22",
"content": "Suburb"
},
"name": "Fairgrounds",
"country": {
"code": "US",
"type": "Country",
"content": "United States"
},
"admin1": {
"code": "US-CA",
"type": "State",
"content": "California"
},
"admin2": {
"code": "",
"type": "County",
"content": "Santa Clara"
},
"admin3": null,
"locality1": {
"type": "Town",
"content": "San Jose"
},
"locality2": {
"type": "Suburb",
"content": "Fairgrounds"
},
"postal": {
"type": "Zip Code",
"content": "95112"
},
"centroid": {
"latitude": "37.326611",
"longitude": "-121.878441"
},
"boundingBox": {
"southWest": {
"latitude": "37.275379",
"longitude": "-121.89254"
},
"northEast": {
"latitude": "37.330879",
"longitude": "-121.808723"
}
}
}
Christian Heilmann's blog – Wait till I come! is the blog of Christian Heilmann , a developer evangelist living and working in London, England. Download vcard.

Feed me, Seymour: Entries (RSS) and Comments (RSS).