background preloader

Technical Tips

Facebook Twitter

Ewanhiggs / csv-game. The CSV Game is a collection of examples of csv parsing programs which have two tests: report the number of fields in a csv file and take the sum of the values in a single column. It began when I saw this Rob Miller talk from GopherCon 2014 about Hekka where he claims that Go is so slow at parsing CSV messages that they pass the data over protocol buffers to a luajit process which parses the message and sends the data back over protocol buffers - and it's quicker than just reading it in Go (14:45 in the video). I could hardly believe this so I wrote some sample code myself to check it.

Sure enough, I found Go to be pretty slow at parsing CSV files. I discussed this with some friends and they contributed other versions in various languagues. So I've collected them here. I don't claim that all of the implementations are representative of idiomatic code. As I don't claim that all the implementations are representative of idiomatic code, PRs are most certainly welcome! There are two tests. Scientific python cheat sheet by IPGP. Pure Python Types a = 2 # integer b = 5.0 # float c = 8.3e5 # exponential d = 1.5 + 0.5j # complex e = 4 > 5 # boolean f = 'word' # string Lists a = ['red', 'blue', 'green'] # manually initialization b = list(range(5)) # initialization through a function c = [nu**2 for nu in b] # initialize through list comprehension d = [nu**2 for nu in b if b < 3] # list comprehension withcondition e = c[0] # access element f = e[1: 2] # access a slice of the list g = ['re', 'bl'] + ['gr'] # list concatenation h = ['re'] * 5 # repeat a list ['re', 'bl'].index('re') # returns index of 're''re' in ['re', 'bl'] # true if 're' in listsorted([3, 2, 1]) # returns sorted list z = ['red'] + ['green', 'blue'] # list concatenation Dictionaries Strings a = 'red' # assignment char = a[2] # access individual characters'red ' + 'blue' # string concatenation'1, 2, three'.split(',') # split string into list'.'.join(['1', '2', 'three']) # concatenate list into string Operators Control Flow IPython Python console <object>?

Fft. Unicode_hack. Vi(m) tip #2: Entering greek/math symbols using vim digraphs « Alec's Web Log. Lately I have been taking computer science/math class notes using vim. Since typing LaTeX is too cumbersome and not readily intuitive (you have to typeset it). I just use plain text. This is fine until I need to quickly type strange letters/symbols. I can do this in vim using digraphs. To see a list of available digraphs, in normal mode type: :digraphs To enter a digraph in insert mode simply hit <ctrl>+k then the two symbols to create the digraph. <ctrl>kF* Below is a table of useful math and computer science digraphs. * I avoid these because they are double-width characters. Note: Greek letters are usually their Latin alphabet “equivalent” then star, with capitals taking capital (uppercase) Latin letters, likewise for lowercase.

Note:“Superscript” and “subscript” numbers are all [digit]S for Superscript and [digit]s for subscript. u[4-hex-digit value] U[8-hex-digit value] Leading zeros may be omitted. Note: On some machines <ctrl>v means paste, in that case use <ctrl>q source.

Racket

Learn You Some Erlang for Great Good! Facebook’s code quality problem. Facebook’s code quality problem tl;dr: It looks like Facebook is getting the textbook results of ignoring code quality. Facebook has a software quality problem. I’m going to try to convince you with three examples. This is important because it demonstrates the time-honored principle that quality matters. In demonstrates it, as Facebook engineers like to say, at scale. Exhibit A: “iOS can’t handle our scale” About a month ago a Facebook engineer gave this presentation: iOS at Facebook, which was followed by a discussion on reddit.

The Facebook iOS app has over 18,000 Objective-C classes, and in a single week 429 people contributing to it. This comment from ChadBan on reddit sums it up: All I can think of when reading this is Martin Fowler’s Design Stamina Hypothesis on what happens to a system without architecture. Fast Database Restarts at Facebook. Our key observation is that we can decouple the memory lifetime from the process lifetime. Fail at Scale. The Architecture of Open Source Applications. Riak and Erlang/OTP. Riak is a distributed, fault tolerant, open source database that illustrates how to build large scale systems using Erlang/OTP. Thanks in large part to Erlang's support for massively scalable distributed systems, Riak offers features that are uncommon in databases, such as high-availability and linear scalability of both capacity and throughput.

Erlang/OTP provides an ideal platform for developing systems like Riak because it provides inter-node communication, message queues, failure detectors, and client-server abstractions out of the box. What's more, most frequently-used patterns in Erlang have been implemented in library modules, commonly referred to as OTP behaviors. They contain the generic code framework for concurrency and error handling, simplifying concurrent programming and protecting the developer from many common pitfalls.

A complete Erlang system such as Riak is a set of loosely coupled applications that interact with each other. 15.1. -module(factorial). 15.2. 15.3. K'th Smallest/Largest Element in Unsorted Array | Set 1. Given an array and a number k where k is smaller than size of array, we need to find the k’th smallest element in the given array. It is given that ll array elements are distinct. Examples: We have discussed a similar problem to print k largest elements.

Method 1 (Simple Solution) A Simple Solution is to sort the given array using a O(nlogn) sorting algorithm like Merge Sort, Heap Sort, etc and return the element at index k-1 in the sorted array. Time Complexity of this solution is O(nLogn). K'th smallest element is 5 Method 2 (Using Min Heap – HeapSelect) We can find k’th smallest element in time complexity better than O(nLogn). Output: Time complexity of this solution is O(n + kLogn). Method 3 (Using Max-Heap) We can also use Max Heap for finding the k’th smallest element. 2) For each element, after the k’th element (arr[k] to arr[n-1]), compare it with root of MH. 3) Finally, root of the MH is the kth smallest element. Time complexity of this solution is O(k + (n-k)*Logk) Mean Shift Clustering Overview. Mean shift clustering is one of my favorite algorithms. It’s a simple and flexible clustering technique that has several nice advantages over other approaches. In this post I’ll provide an overview of mean shift and discuss some of its strengths and weaknesses.

All of the code used in this blog post can be found on github. Kernel Density Estimation The first step when applying mean shift (and all clustering algorithms) is representing your data in a mathematical manner. Mean shift builds upon the concept of kernel density estimation (KDE). Below is the KDE surface for our points above using a Gaussian kernel with a kernel bandwidth of 2. Mean Shift So how does mean shift come into the picture? Depending on the kernel bandwidth used, the KDE surface (and end clustering) will be different. The top animation results in three KDE surface peaks, and thus three clusters.

The Mean Shift Algorithm The general algorithm outline is: The shift function looks like this: Image Segmentation Application.

Ruby

Graphs. Tests. Immutable. Becoming a Data Scientist - Curriculum via Metromap - Pragmatic Perspectives. Data Science, Machine Learning, Big Data Analytics, Cognitive Computing …. well all of us have been avalanched with articles, skills demand info graph’s and point of views on these topics (yawn!). One thing is for sure; you cannot become a data scientist overnight. Its a journey, for sure a challenging one. But how do you go about becoming one? Where to start?

When do you start seeing light at the end of the tunnel? What is the learning roadmap? What tools and techniques do I need to know? Given how critical visualization is for data science, ironically I was not able to find (except for a few), pragmatic and yet visual representation of what it takes to become a data scientist. FundamentalsStatisticsProgrammingMachine LearningText Mining / Natural Language ProcessingData VisualizationBig DataData IngestionData MungingToolbox Each area / domain is represented as a “metro line”, with the stations depicting the topics you must learn / master / understand in a progressive fashion. Greenfoot. How to write the perfect pull request. Serve a local file as http (once only) Typhoeus/typhoeus. Place Autocomplete - Google Places API. Looking to use this service in a JavaScript application? Check out the Places Library of the Google Maps API v3. The Place Autocomplete service is a web service that returns place predictions in response to an HTTP request.

The request specifies a textual search string and optional geographic bounds. The service can be used to provide autocomplete functionality for text-based geographic searches, by returning places such as businesses, addresses and points of interest as a user types. Examples of Mobile Apps using Place Autocomplete The Place Autocomplete service is useful in mobile apps, where you may want to offer users a location-based autocomplete feature.

The following images show mobile applications using Place Autocomplete to assist a user in finding a location. Place Autocomplete Requests The Place Autocomplete service is part of the Google Places API and shares an API key and quotas with the Google Places API. A Place Autocomplete request is an HTTP URL of the following form: <? Unix Background Queue. For a side-project to be run on a single machine I needed a background queue. I like self-contained software like sqlite, but I didn’t know of any self-contained background queue. They usually rely on some kind of broker, whether that is Redis or a database. I decided it would be fun to write one! Here’s the weekend story of toying with Unix, Ruby C extensions, MRI and Ruby to create localjob.

Unix inter-process communication To engineer my self-contained solution I looked into Unix’s IPC functionality, the classics include: Files. I stumbled upon the POSIX message queue during my research, which has everything I was looking for: Persistent. Creating a Ruby wrapper for the POSIX message queue Ruby’s standard library does not provide access to the POSIX message queue, which meant I’d have to roll my own with a Ruby C extension. POSIX message queue provides blocking calls like mq_receive(3) and mq_send(3).

It was a fun experience creating a Ruby C extension. Localjob is born Multiple queues. Event loops demystified - Practicing Ruby, Issue 5.3. This issue of Practicing Ruby was contributed by Magnus Holm (@judofyr), a Ruby programmer from Norway. Magnus works on various open source projects (including the Camping web framework), and writes articles over at the timeless repository. Working with network I/O in Ruby is so easy: require 'socket' # Start a server on port 9234server = TCPServer.new('0.0.0.0', 9234) # Wait for incoming connectionswhile io = server.accept io << "HTTP/1.1 200 OK\r\n\r\nHello world! " io.closeend # Visit in your browser. Boom, a server is up and running! Working in Ruby has some disadvantages, though: we can handle only one connection at a time.

We can also have only one server running at a time. There are several ways to improve this situation, but lately we've seen an influx of event-driven solutions. Although these solutions might seem like silver bullets, there are subtle details that you'll have to think about. Obligatory chat server example Event handling The IO loop IO events. Why You Should Be Excited About Garbage Collection in Ruby 2.0. You may have heard last week how Innokenty Mihailov’s great Enumerable::Lazy feature was accepted into the Ruby 2.0 code base. But you may not have heard about an even more significant change that was merged into Ruby 2.0 in January: a new algorithm for garbage collection called “Bitmap Marking.” The developer behind this sophisticated and innovative change, Narihiro Nakamura, has been working on this since 2008 at least and also implemented the “Lazy Sweep” garbage collection algorithm already included in Ruby 1.9.3.

The new Bitmap Marking GC algorithm promises to dramatically reduce overall memory consumption by all Ruby processes running on a web server! But what does “bitmap marking” really mean? And exactly why will it reduce memory consumption? If you know Japanese you can read a detailed academic paper published in 2008 by Narihiro Nakamura along with Yukihiro (“Matz”) Matsumoto. Mark and Sweep Ruby allocates and organizes these RValue structures in arrays called “heaps.” Keyboard - How to permanently swap esc and caps lock in xfce / xubuntu?

Screencast: Coding Conway’s Game of Life in Ruby the TDD Way with RSpec. Recently, there have been many screencasts of people coding things in real time. Yesterday, Ryan Bigg released a video of him implementing Conway's Game of Life from scratch by reading through the 'rules' and then using RSpec to take a test driven approach to fleshing out the functionality. Ryan is a Ruby Hero and technical writer best known for being co-author of the recently released Rails 3 in Action (along with Yehuda Katz) which I'll be reviewing soon for Ruby Inside. But Ryan's also been getting into doing a little screencasting: If you can't see the video above, view it directly on Vimeo here. Ryan's technique is just one of many legitimate approaches but many of you will find something to pick up from this, especially if you're not familiar with test driven development or, perhaps, RSpec.

If you're already working on koans non-stop and consider yourself well versed in the ways of TDD, you might want to skip it. The Nature of Lisp. Monday, May 8, 2006 Introduction When I first stumbled into Lisp advocacy on various corners of the web I was already an experienced programmer. At that point I had grokked what seemed at the time a wide range of programming languages.

I was proud to have the usual suspects (C++, Java, C#, etc.) on my service record and was under impression that I knew everything there is to know about programming languages. I couldn't have possibly been more wrong. My initial attempt to learn Lisp came to a crashing halt as soon as I saw some sample code. The moment I regained my sight I communicated my frustrations to some members of the Lisp sect. For many months the Lisp advocates pressed on. The enlightenment came instantaneously. That very second I became a member of the Lisp cult. I gave the matter careful thought. I shared my ideas with fellow Lispers. XML Reloaded A thousand mile journey starts with a single step. <todo name="housework"><item priority="high">Clean the house. So, where are we? Submodules. It often happens that while working on one project, you need to use another project from within it. Perhaps it’s a library that a third party developed or that you’re developing separately and using in multiple parent projects.

A common issue arises in these scenarios: you want to be able to treat the two projects as separate yet still be able to use one from within the other. Here’s an example. Suppose you’re developing a web site and creating Atom feeds. Instead of writing your own Atom-generating code, you decide to use a library. Git addresses this issue using submodules. Starting with Submodules We’ll walk through developing a simple project that has been split up into a main project and a few sub-projects. Let’s start by adding an existing Git repository as a submodule of the repository that we’re working on. By default, submodules will add the subproject into a directory named the same as the repository, in this case “DbConnector”.

First you should notice the new .gitmodules file. 5 Ways to Send Email From Linux Command Line - TecAdmin.net. Light Table Workflow for Interactive Clojure Development - Safari Blog. Zsh: 14. Expansion. cURL - Tutorial. How to auto deploy Rails apps after Git push ··· Nico Hagenburger. Rails Rumble Gem Teardown - DwellableTrends. Linux - Detect number of IDLE processors ruby - Stack Overflow.