Loading external content with Ajax using jQuery and YQL
Let’s solve the problem of loading external content (on other domains) with Ajax in jQuery. All the code you see here is “available on GitHub”:http://github.com/codepo8/crossdomain-ajax-with-jquery-and-yql and “can be seen on this demo page”:http://icant.co.uk/articles/crossdomain-ajax-with-jquery/index.html so no need to copy and paste!
OK, Ajax with jQuery is very easy to do – like most solutions it is a few lines:
$(document).ready(function(){
$('.ajaxtrigger').click(function(){
$('#target').load('ajaxcontent.html');
});
});
Check out this “simple and obtrusive Ajax demo”:http://icant.co.uk/articles/crossdomain-ajax-with-jquery/simple-ajax.html to see what it does.
This will turn all elements with the class of ajaxtrigger into triggers to load “ajaxcontent.html” and display its contents in the element with the ID target.
This is terrible, as it most of the time means that people will use pointless links like <a href="#">click me</a>, but this is not the problem for today. I am working on a larger article with all the goodies about Ajax usability and accessibility.
However, to make this more re-usable we could do the following:
$(document).ready(function(){
$('.ajaxtrigger').click(function(){
$('#target').load($(this).attr('href'));
return false;
});
});
You can then use <a href="ajaxcontent.html" class="ajaxtrigger">load some content</a> to load the content and you make the whole thing re-usable.
Check out this “more reusable Ajax demo”:http://icant.co.uk/articles/crossdomain-ajax-with-jquery/reusable-ajax.html to see what it does.
The issue I wanted to find a nice solution for is the one that happens when you click on the second link in the demo: loading external files fails as Ajax doesn’t allow for cross-domain loading of content. This means that <a href="http://icant.co.uk/" class="ajaxtrigger">see my portfolio</a> will fail to load the Ajax content and fail silently at that. You can click the link until you are blue in the face but nothing happens. A dirty hack to avoid this is just allowing the browser to load the document if somebody really tries to load an external link.
Check out this “allowing external links to be followed”:http://icant.co.uk/articles/crossdomain-ajax-with-jquery/allowing-link-following.html to see what it does.
$(document).ready(function(){
$('.ajaxtrigger').click(function(){
var url = $(this).attr('href');
if(url.match('^http')){
return true;
} else {
$('#target').load(url);
return false;
}
});
});
Proxying with PHP
If you look around the web you will find the solution in most of the cases to be PHP proxy scripts (or any other language). Something using cURL could be for example proxy.php:
<?php
$url = $_GET['url'];
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($ch);
curl_close($ch);
echo $content;
?>
People then could use this with a slightly changed script (“using a proxy”:http://icant.co.uk/articles/crossdomain-ajax-with-jquery/using-proxy.html):
$(document).ready(function(){
$('.ajaxtrigger').click(function(){
var url = $(this).attr('href');
if(url.match('^http')){
url = 'proxy.php?url=' + url;
}
$('#target').load(url);
return false;
});
});
It is also a spectacularly stupid idea to have a proxy script like that. The reason is that without filtering people can use this to load any document of your server and display it in the page (simply use firebug to rename the link to show anything on your server), they can use it to inject a mass-mailer script into your document or simply use this to redirect to any other web resource and make it look like your server was the one that sent it. It is spammer’s heaven.
Use a white-listing and filtering proxy!
So if you want to use a proxy, make sure to white-list the allowed URIs. Furthermore it is a good plan to get rid of everything but the body of the other HTML document. Another good idea is to filter out scripts. This prevents display glitches and scripts you don’t want executed on your site to get executed.
Something like this:
<?php
$url = $_GET['url'];
$allowedurls = array(
'http://developer.yahoo.com',
'http://icant.co.uk'
);
if(in_array($url,$allowedurls)){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($ch);
curl_close($ch);
$content = preg_replace('/.*<body[^>]*>/msi','',$output);
$content = preg_replace('/<\/body>.*/msi','',$content);
$content = preg_replace('/<?\/body[^>]*>/msi','',$content);
$content = preg_replace('/[\r|\n]+/msi','',$content);
$content = preg_replace('/<--[\S\s]*?-->/msi','',$content);
$content = preg_replace('/<noscript[^>]*>[\S\s]*?<\/noscript>/msi',
'',$content);
$content = preg_replace('/<script[^>]*>[\S\s]*?<\/script>/msi',
'',$content);
$content = preg_replace('/<script.*\/>/msi','',$content);
echo $content;
} else {
echo 'Error: URL not allowed to load here.';
}
?>
Pure JavaScript solution using YQL
But what if you have no server access or you want to stay in JavaScript? Not to worry – it can be done. YQL allows you to load any HTML document and get it back in JSON. As jQuery has a nice interface to load JSON, this can be used together to achieve what we want to.
Getting HTML from YQL is as easy as using:
select * from html where url="http://icant.co.uk"
YQL does a few things extra for us:
- It loads the HTML document and sanitizes it
- It runs the HTML document through HTML Tidy to remove things
.NETnasty frameworks considered markup. - It caches the HTML for a while
- It only returns the body content of the HTML - so no styling (other than inline styles) will get through.
As output formats you can choose XML or JSON. If you define a callback parameter for JSON you get JSON-P with all the HTML as a JavaScript Object – not fun to re-assemble:
foo({
"query":{
"count":"1",
"created":"2010-01-10T07:51:43Z",
"lang":"en-US",
"updated":"2010-01-10T07:51:43Z",
"uri":"http://query.yahoo[...whatever...]k%22",
"results":{
"body":{
"div":{
"id":"doc2",
"div":[{"id":"hd",
"h1":"icant.co.uk - everything Christian Heilmann"
},
{"id":"bd",
"div":[
{"div":[{"h2":"About this and me","[... and so on...]
}}}}}}}});
When you define a callback with the XML output you get a function call with the HTML data as string in an Array – much easier:
foo({
"query":{
"count":"1",
"created":"2010-01-10T07:47:40Z",
"lang":"en-US",
"updated":"2010-01-10T07:47:40Z",
"uri":"http://query.y[...who cares...]%22"},
"results":[
"<body>\n <div id=\"doc2\">\n <div id=\"hd\">\n
<h1>icant.co.uk - everything Christian Heilmann<\/h1>\n
... and so on ..."
]
});
Using jQuery’s getJSON() method and accessing the YQL endpoint this is easy to implement:
$.getJSON("http://query.yahooapis.com/v1/public/yql?"+
"q=select%20*%20from%20html%20where%20url%3D%22"+
encodeURIComponent(url)+
"%22&format=xml'&callback=?",
function(data){
if(data.results[0]){
var data = filterData(data.results[0]);
container.html(data);
} else {
var errormsg = '<p>Error: could not load the page.</p>';
container.html(errormsg);
}
}
);
Putting it all together you have a “cross-domain Ajax solution with jQuery and YQL“:http://icant.co.uk/articles/crossdomain-ajax-with-jquery/using-yql.html:
$(document).ready(function(){
var container = $('#target');
$('.ajaxtrigger').click(function(){
doAjax($(this).attr('href'));
return false;
});
function doAjax(url){
// if it is an external URI
if(url.match('^http')){
// call YQL
$.getJSON("http://query.yahooapis.com/v1/public/yql?"+
"q=select%20*%20from%20html%20where%20url%3D%22"+
encodeURIComponent(url)+
"%22&format=xml'&callback=?",
// this function gets the data from the successful
// JSON-P call
function(data){
// if there is data, filter it and render it out
if(data.results[0]){
var data = filterData(data.results[0]);
container.html(data);
// otherwise tell the world that something went wrong
} else {
var errormsg = '<p>Error: could not load the page.</p>';
container.html(errormsg);
}
}
);
// if it is not an external URI, use Ajax load()
} else {
$('#target').load(url);
}
}
// filter out some nasties
function filterData(data){
data = data.replace(/<?\/body[^>]*>/g,'');
data = data.replace(/[\r|\n]+/g,'');
data = data.replace(/<--[\S\s]*?-->/g,'');
data = data.replace(/<noscript[^>]*>[\S\s]*?<\/noscript>/g,'');
data = data.replace(/<script[^>]*>[\S\s]*?<\/script>/g,'');
data = data.replace(/<script.*\/>/,'');
return data;
}
});
This is rough and ready of course. A real Ajax solution should also consider timeout and not found scenarios. Check out the “full version with loading indicators, error handling and yellow fade”:http://icant.co.uk/articles/crossdomain-ajax-with-jquery/error-handling.html for inspiration.
Tags: ajax, crossdomain, javascript, jquery, php, proxy, yql


January 10th, 2010 at 10:01 pm
good stuff Chris!
fyi, some of the test pages did not work ok with Chrome3 on Win:
/reusable-ajax.html
/allowing-link-following.html
- Aaron
January 10th, 2010 at 10:06 pm
what didn’t work? Loading of the external file? That would be whole idea of the article :)
January 10th, 2010 at 10:08 pm
Loading external content with Ajax in jQuery and YQL: [link to post] /via @codepo8
– Posted using Chat Catcher
January 10th, 2010 at 10:19 pm
Yes, loading the Y! Dev page
January 10th, 2010 at 10:42 pm
Wait till I come! » Loading external content with Ajax using … [link to post] #jQuery
– Posted using Chat Catcher
January 10th, 2010 at 10:46 pm
Everything works as it should with Chrome 4.0.249.43 on Ubuntu 9.04…
Nice article. Thanks!
January 11th, 2010 at 12:31 am
RT @codepo8: Loading external content with Ajax in jQuery and YQL: [link to post]
– Posted using Chat Catcher
January 11th, 2010 at 1:31 am
RT @codepo8 Loading external content with Ajax in jQuery and YQL: [link to post]
– Posted using Chat Catcher
January 11th, 2010 at 1:52 am
Really nice article, I particularly
liked this little filterData method,
which is pure proof of concern
on the security side. All those words
to say I liked it.
Cheers.
January 11th, 2010 at 6:23 am
Loading external content using jQuery n YQL [link to post]
– Posted using Chat Catcher
January 11th, 2010 at 9:38 am
RT @palleman: Loading external content with #Ajax using #jQuery and #YQL: [link to post] /src @codepo8
– Posted using Chat Catcher
January 12th, 2010 at 2:17 pm
One thing to notice though that overwriting the link with a click event in the more reusable ajax demo will break the browser in some scenarios by disallowing to open a new tab. As a believer of progressive enhancement, a link should still follow through if javascript is broken. Which means that the server has to implement two scenarios when returning html. If the request is an ajax call, just return an html snippet. If it is not, return the page. If the script is applied in this scenario, the user will no longer be able to press Ctrl+Click to open the page in a new tab. To avoid this, check whether the event has the metakey attached to it. If so, return true.
January 15th, 2010 at 7:34 am
Nice article on loading external content with jQuery and YQL. [link to post]
– Posted using Chat Catcher
January 15th, 2010 at 7:35 am
fun! cross domain ajax requests using jquery and yql [link to post] #jquery #yql #yahoo #ajax
– Posted using Chat Catcher
January 15th, 2010 at 7:35 am
RT: @rgaidot: fun! cross domain ajax requests using jquery and yql [link to post] #jquery #yql #yahoo #ajax
– Posted using Chat Catcher
January 15th, 2010 at 7:37 am
RT @rgaidot: fun! cross domain ajax requests using jquery and yql [link to post] #jquery #yql #yahoo #ajax
– Posted using Chat Catcher
January 15th, 2010 at 7:37 am
RT @seosaxena: cross domain ajax requests using jquery and yql [link to post] #jquery #yql #yahoo #ajax
– Posted using Chat Catcher
January 15th, 2010 at 7:39 am
Loading external content with #Ajax using #jQuery and #YQL [link to post] #webdev #javascript #tutotial
– Posted using Chat Catcher
January 15th, 2010 at 9:14 pm
RT @itscientist: RT @andymurd: RT: @rgaidot: fun! cross domain ajax requests using jquery and yql [link to post] #jquery #yql #yah …
– Posted using Chat Catcher
January 15th, 2010 at 9:31 pm
RT @andymurd: RT: @rgaidot: fun! cross domain ajax requests using jquery and yql [link to post] #jquery #yql #yahoo #ajax
– Posted using Chat Catcher
January 19th, 2010 at 9:13 pm
Wait till I come! » Loading external content with Ajax using jQuery and YQL [link to post]
– Posted using Chat Catcher
January 22nd, 2010 at 9:17 am
Wai… [link to post]
– Posted using Chat Catcher
January 22nd, 2010 at 9:35 am
Loading external links / content with jQuery [link to post]
– Posted using Chat Catcher
February 17th, 2010 at 5:01 pm
Interesting idea but doesnt usefully :(
February 19th, 2010 at 2:22 pm
Hello,
this seems to fail in FF 3.5.8 OSX but works in Safari 4.0.4 on the same machine.
Does jQuery use a normal .load in this instance, and the same origin policy kicks in?
March 4th, 2010 at 10:31 pm
Good article for beginners.
I tried to retrieve the xml content from the following http://entheros.amnh.org:80/digir/DiGIR.php
but only the textcontent were obtained how do I get the entire xml?
March 12th, 2010 at 1:39 pm
Hi, I wanted to make a bookmarklet/userscript to load full posts instead of the excerpt in some feeds in Google Reader.
http://jsbin.com/oxage3/edit (I removed some features to make it easier to read)
This works (if you first “jQuerify” the page) as a bookmarklet, but for some reason some words get concatenated: http://droplr.com/AdGf
Do you have any idea of why this happens?
March 19th, 2010 at 8:04 pm
Good job. Thanks a lot.
March 23rd, 2010 at 9:36 pm
Hey there, I’m fairly new to javascript and whatnot, but I’ve got this project where I need to gather a schedule for a school application for the iPhone. (It’s written with JQtouch and converted by phonegap, thusly letting us write it in javascript/html)
So the problem is that the schedule is in .aspx (I barely know what that is to begin with) and when using this godsend of a script, all it returns is some crazy jibberish.
Has anyone got any experience / input on the matter? :x