Hacker News new | ask | show | jobs
by krapp 4542 days ago
The json_encode one is particularly annoying as you can't even consider sending JSON to the browser and using it in JS without all the flags on. especially if you're dealing with user data or scraping websites.
2 comments

Just out of curiosity, could you expand on that a little more? I'm currently in the process of ripping my hair out with json_encode issues.

We send back a json_encoded array containing a property "html" with html markup to be injected back into a contenteditable div. Originally I had a ton of problems if there happened to be the invisible unicode character in there (php's str functions could not find/replace this character no matter what I tried). However, this is the first I've seen those flags the GP mentioned.

If you look up the json_encode function on php.net it lists the flags. As I understand it there's a difference between the encoding allowed in javascript and the encoding allowed in JSON which means certain characters which would be valid in a javascript object don't necessarily work in JSON, so you end up having to escape everything possible. There's one flag (JSON_ESCAPE_UNICODE) which doesn't work on the version of PHP I develop on so I ended up applying code listed in the comments. This also adds some cruft similar to what google/facebook etc use if you want it. I haven't had any issues with it yet but, being what it is, there may be a more elegant way to accomplish it.

   function AsJSON($arr, $cruft=null){
		
	//convmap since 0x80 char codes so it takes all multibyte codes (above ASCII 127). 
	//So such characters are being "hidden" from normal json_encoding
        array_walk_recursive($arr, function (&$item, $key) {  	
        	if (is_string($item)){  
        		$item = mb_encode_numericentity($item, array (0x80, 0xffff, 0, 0xffff), 'UTF-8');
        	}
        });
        
        $JSON = mb_decode_numericentity(
        	json_encode($arr, JSON_HEX_TAG | JSON_HEX_APOS | JSON_HEX_QUOT | JSON_HEX_AMP), 
        	array (0x80, 0xffff, 0, 0xffff), 
        	'UTF-8'
        );

        if(json_last_error() === JSON_ERROR_NONE){ 
	  return $cruft.$JSON;
	 	}
	}
You might also consider looking at the ctype_* functions if you haven't.
Seeing the type of content, the character in question might very well be a BOM.

You can identify them by pack("CCC",0xef,0xbb,0xbf), encoding to ""

define('OH_MY_THOSE_FLAG', FLAG | FLAG | FLAG | FLAG)

But then again, my name gives me special powers.

Nevertheless...
Yeah but as with many complains about PHP, how would you define the default behavior? I'm seeing so many people who expect PHP to always do stuff how they want, because of how easy it is to work with. I never read those complaints about super tricky C cases.

Modifying the content by default would be completely counter-intuitive, specially with a function that is used mostly in API and thus being encoded / decoded in different places and languages.

IMO content filtering is specific to the view where you output it, and should be done there. Any hazardous content should always go through a sanitize function when you echo it in the middle of HTML.