background preloader

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
by Joel Spolsky Wednesday, October 08, 2003 Ever wonder about that mysterious Content-Type tag? You know, the one you're supposed to put in HTML and you never quite know what it should be? Did you ever get an email from your friends in Bulgaria with the subject line "???? ?????? I've been dismayed to discover just how many software developers aren't really completely up to speed on the mysterious world of character sets, encodings, Unicode, all that stuff. But it won't. So I have an announcement to make: if you are a programmer working in 2003 and you don't know the basics of characters, character sets, encodings, and Unicode, and I catch you, I'm going to punish you by making you peel onions for 6 months in a submarine. And one more thing: In this article I'll fill you in on exactly what every working programmer should know. A Historical Perspective The easiest way to understand this stuff is to go chronologically. And all was good, assuming you were an English speaker. Unicode Hello Encodings

Daily Builds Are Your Friend by Joel Spolsky Saturday, January 27, 2001 In 1982, my family took delivery of the very first IBM-PC in Israel. We actually went down to the warehouse and waited while our PC was delivered from the port. Regex Tutorial - Unicode Characters and Properties Unicode is a character set that aims to define all characters and glyphs from all human languages, living and dead. With more and more software being required to support multiple languages, or even just any language, Unicode has been strongly gaining popularity in recent years. Using different character sets for different languages is simply too cumbersome for programmers and users.

OSI model The Open Systems Interconnection model (OSI) is a conceptual model that characterizes and standardizes the internal functions of a communication system by partitioning it into abstraction layers. The model is a product of the Open Systems Interconnection project at the International Organization for Standardization (ISO), maintained by the identification ISO/IEC 7498-1. The model groups communication functions into seven logical layers. You aren't gonna need it "You aren't gonna need it"[1][2] (acronym: YAGNI)[3] is a principle of extreme programming (XP) that states a programmer should not add functionality until deemed necessary.[4] Ron Jeffries writes, "Always implement things when you actually need them, never when you just foresee that you need them."[5] The phrase also appears altered as, "You aren't going to need it"[6][7] or sometimes phrased as, "You ain't gonna need it". YAGNI is a principle behind the XP practice of "do the simplest thing that could possibly work" (DTSTTCPW).[2][3] It is meant to be used in combination with several other practices, such as continuous refactoring, continuous automated unit testing and continuous integration. Used without continuous refactoring, it could lead to messy code and massive rework.

Parsing huge file without reading into memory (Performance forum at JavaRanch) Originally posted by David Harkness: Not if you set the ByteBuffer's position and limit of the buffer before decoding it. Loop over the mapped buffer, setting up a good block size using position and limit. Decocding will now just decode the bytes in the range you specify.Use CharsetDecoder.decode(ByteBuffer, CharBuffer) or one of the other similar methods so you can reuse the same CharBuffer. Since decoding advances the position, it should leave you at the next correct spot, dealing with multi-byte character encodings for you; just set limit to be position + BLOCK_SIZE and keep going.If you want ultimate speed, cannot count on ASCII files, and don't want to write your own specialized decoder, this is the way to go.

Peter Thiel’s Unorthodox Management Philosophy of Extreme Focus Peter Thiel’s Unorthodox Management Philosophy of Extreme Focus “What are your top five priorities for this week?” “What are the top three objectives and key results you’re using to measure how you’re doing for the quarter?” These are questions that get thrown around by managers at work to help their teams prioritize and focus on achieving the most important accomplishments. In Peter Thiel’s view, this doesn’t go far enough.

The First Few Milliseconds of an HTTPS Connection Convinced from spending hours reading rave reviews, Bob eagerly clicked “Proceed to Checkout” for his gallon of Tuscan Whole Milk and… Whoa! What just happened? 16.7. mmap — Memory-mapped file support Memory-mapped file objects behave like both strings and like file objects. Unlike normal string objects, however, these are mutable. You can use mmap objects in most places where strings are expected; for example, you can use the re module to search through a memory-mapped file. Since they’re mutable, you can change a single character by doing obj[index] = 'a', or change a substring by assigning to a slice: obj[i1:i2] = '...'. What do LimeWire, Napster, Kazaa, and Isohunt all have in common? LimeWire now joins the ignoble club of sites and services around the workd that have been found liable for inducing, contributing to, or authorizing massive online copyright infringement. Other well known sites and services found liable on these and other secondary liability or criminal theories include Napster, Aimster, Grokster, Kazaa, Pirate Bay, Mininova,, Newzbin, and Isohunt. Courts around the world have not tolerated or been willing to countenance online businesses whose core business model involves profiting from facilitating online copyright infringement. The most recent example is LimeWire. The recording industry just won a major copyright piracy lawsuit in the US against LimeWire, one of the most popular remaining p2p music file sharing services in the US. The court held there was “overwhelming evidence that LimeWire engaged in purposeful conduct that fostered infringement” to support liability based on inducement.

How Non-Member Functions Improve Encapsulation I'll start with the punchline: If you're writing a function that can be implemented as either a member or as a non-friend non-member, you should prefer to implement it as a non-member function. That decision increases class encapsulation. When you think encapsulation, you should think non-member functions.