35 Google open-source projects that you probably don't know

This text is translation of: 34 projekty Open Source udostępnione przez Google


Currently list is longer than 35 projects, during change from Polish to English I have added one new project - and this is why title says 35 instead 34 ;). After updates there are even more! Sorry for your confusion.

Google is one of the biggest companies supporting OpenSource movement, they released more than 500 open source projects(most of them are samples showing how to use their API). In this article I will try to write about most interesting and free releases from Google, some of them might be abandoned.


List of projects developed at Google and released as opensource (thanks @dobs from reddit) can be displayed also here

Text File processing

Google CRUSH (Custom Reporting Utilities for SHell)
CRUSH is a collection of tools for processing delimited-text data from the command line or in shell scripts. Tutorial how to use it is here

C++ libraries and sources

Google Breakpad
An open-source multi-platform crash reporting system. Breakpad is a minidump-generation library used for snapshotting processes out in the field for later analysis. The format is similar to core files but was developed by Microsoft for it's crash-uploading facility. A minidump-creation library for Mac/Linux has been implemented so that the crash-processing back-end only needs to understand one format.
Google GFlags
The gflags package contains a library that implements commandline flags processing. As such it's a replacement for getopt(). It has increased flexibility, including built-in support for C++ types like string. Here is introduction how to use it.
Google Glog
The glog library implements application-level logging. This library provides logging APIs based on C++-style streams and various helper macros. It can be used under Linux, BSD, and Windows. Here is introduction how to use Glog.
Google PerfTools
These tools are for use by developers so that they can create more robust applications. Especially of use to those developing multi-threaded applications in C++ with templates. Includes TCMalloc, heap-checker, heap-profiler and cpu-profiler. Instructions how to use PerfTools can be found here and here.
Google Sparse Hash
An extremely memory-efficient hash_map implementation. 2 bits/entry overhead. The SparseHash library contains several hash-map implementations, including implementations that optimize for space or speed. The Google sparsehash package consists of two hashtable implementations: sparse, which is designed to be very space efficient, and dense, which is designed to be very time efficient. For each one, the package provides both a hash-map and a hash-set, to mirror the classes in the common STL implementation. Docs are here.
Omaha - Google Update
Omaha, otherwise known as Google Update, is a program to install requested software and keep it up to date. So far, Omaha supports many Google products for Windows, including Google Chrome and Google Earth, but there is no reason for it to only support Google products. Here is Omaha Overview and Developers Setup Guide.
Protocol Buffers
Protocol Buffers are a way of encoding structured data in an efficient yet extensible format. Google uses Protocol Buffers for almost all of its internal RPC protocols and file formats. Here is developer guide, this protocol can be used in many languages and it is suported by few IDE - for example NetBeans

The Internet

Google Code Prettify
A Javascript module and CSS file that allows syntax highlighting of source code snippets in an html page. It supports: C/C++, Java, Python, Ruby, PHP, VisualBasic, AWK, Bash, SQL, HTML, XML, CSS, JavaScript, Makefiles and some Perl. Not supported: Smalltalk and all *CAML*. For example click here
SpriteMe - easy "CSS sprites"
SpriteMe makes it easy to create CSS sprites (connect many small images to one larger to reduce new connections to webserver when loading webpage). This projects is also available as service under: http://spriteme.org/.
Reducisaurus is a web service for minifying and serving CSS and JS files. Reducisaurus is based on YUI Compressor and runs on AppEngine.
JaikuEngine is a social microblogging platform that runs on AppEngine. JaikuEngine powers Jaiku.com. For the mobile client source, see: Jaiku Mobile client. Here is README for project
Selector Shell
The Selector Shell is a browser-based tool for testing what CSS becomes in different browsers. It works by taking some raw text, inserting a dynamic STYLE element into the HEAD with that raw text as its content, and then reading the CSSOM to see what the browser has parsed it into. It is written in Javascript. It can be tested here.
Google Feed Server
Google Feed Server is an open source Atom Publishing Protocol server based on the Apache Abdera framework. Google Feed Server provides a simple back end for data adapters, which allows developers to quickly deploy a feed for an existing data source such as a database. Google Feed Server also provides the Feed Server Client Tool (FSCT), which lets developers perform create, receive, update, and delete (CRUD) operations on a Feed Server feed. Here are links to start it up and get running.
Melange, the Spice of Creation
The goal of this project is to create a framework for representing Open Source contribution workflows, such as the existing Google Summer of Code TM (GSoC) program. Using this framework, it will be possible to host future Google Summer of Code programs (and other similar programs, such as the Google Highly Open Participation TM Contest, or GHOP) on Google App Engine. Here you can checkout Getting Started Guide
This project hunts down the fastest DNS servers available for your computer to use. namebench runs a fair and thorough benchmark using your web browser history, tcpdump output, or standardized datasets in order to provide an individualized recommendation. namebench is completely free and does not modify your system in any way. This project began as a 20% project at Google. namebench runs on Mac OS X, Windows, and UNIX, and is available with a graphical user interface as well as a command-line interface. BTW: Google has own free public caching DNS servers at ip: i
Rat Proxy
A semi-automated, largely passive web application security audit tool, optimized for an accurate and sensitive detection, and automatic annotation, of potential problems and security-relevant design patterns based on the observation of existing, user-initiated traffic in complex web 2.0 environments. It detects and prioritizes broad classes of security problems, such as dynamic cross-site trust model considerations, script inclusion issues, content serving problems, insufficient XSRF and XSS defenses, and much more. Docs are here. Project is written and maintained by Michał Zalewski (lcamtuf).
Top Draw is an image generation program. By using simple text scripts, based on the JavaScript programming language, Top Draw can create surprisingly complex and interesting images. The cool part is that the program has built in support for taking your image and installing it as your desktop image. There's even a Viewer application that can be installed in the menubar to automatically run with the parameters (such as the selected script, update interval) that you've specified. The projects is developed in XCode, and runs on: Mac OS X 10.5 (Leopard) or later.
Open source release of EtherPad, a web-based realtime collaborative document editor. This project exists mainly as an exhibition of the code, to help support those who want to run or modify their own etherpad servers, or for those who are curious about how etherpad's algorithms make realtime collaboration possible. Here are some instructions how to build etherpad, and screencast what is all about. Etherpad uses JavaScript, Java and Comet server for make real time collaboration make working.
Chromium is the open-source project behind Google Chrome. Chromoium project is about create a powerful platform for developing a new generation of web applications. There are not so many differences between Chrome and Chromium. Here are instructions how to build Chromium on Linux. Tere are also official releases of Chrome for Windows, Mac and Linux.
V8 Google's open source JavaScript engine
V8 is Google's open source JavaScript engine. V8 is written in C++ and is used in Google Chrome, the open source browser from Google. V8 implements ECMAScript as specified in ECMA-262, 3rd edition, and runs on Windows XP and Vista, Mac OS X 10.5 (Leopard), and Linux systems that use IA-32 or ARM processors. V8 can run standalone, or can be embedded into any C++ application, here are some helpfull docs how to begin.
Chromium OS
Chromium OS is an open-source project that aims to build an operating system that provides a fast, simple, and more secure computing experience for people who spend most of their time on the web. Sources are available on: http://git.chromium.org/ src
Android is the first free, open source, and fully customizable mobile platform. Android offers a full stack: an operating system, middleware, and key mobile applications. It also contains a rich set of APIs that allows third-party developers to develop great applications.

Tools for MySQL

Google MySQL Tools
Various tools for managing, maintaining, and improving the performance of MySQL databases, originally written by Google. This includes:
  • mypgrep.py - a tool, similar to pgrep, for managing mysql connections
  • compact_innodb.py - compacts innodb datafiles by dumping and reloading all tables
Google mMAIM
mMAIM's purpose is to make it easy to monitor and analyze MySQL servers and to easily integrate itself into any environment. It can show Master/Slave sync stats, some efficiency stats, can return statistics from most of the "show" command, and more!

Other projects

Stressful Application Test (stressapptest)
Stressful Application Test (or stressapptest, its unix name) tries to maximize randomized traffic to memory from processor and I/O, with the intent of creating a realistic high load situation in order to test the existing hardware devices in a computer. It has been used at Google for some time and now it is available under the apache 2.0 license. Here are some docs: Introduction, Installation Guide and User Guide
Pop and IMAP Troubleshooter
The POP and IMAP troubleshooter serves to diagnose and solve connection problems from client machines to email services. It reads the client configuration files (Outlook, Windows Mail, Thunderbird, etc.), checks the individual settings, and then attempts to create POP, IMAP, and SMTP connections using these settings. The troubleshooter is coded in C++ using the Qt environment. It can be used generically, or can be customized for the demands of a particular email service.
Openduckbill is a simple command line backup tool for Linux, which is capable of monitoring the files/directories marked for backups for any changes and transferring these changes either to a local backup directory or a remote NFS exported partition or to a remote ssh server using the very common, rsync command. Here is installation guide.
ZXing (pronounced "zebra crossing") is an open-source, multi-format 1D/2D barcode image processing library implemented in Java. Our focus is on using the built-in camera on mobile phones to photograph and decode barcodes on the device, without communicating with a server. As far I know it can be found on Android Platform. Checkout Getting stared guide, and chackout list of supported devices (My SonyEricson device is capable!).
Tesseract OCR Engine
The Tesseract OCR engine was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but it is probably one of the most accurate open source OCR engines available. The source code will read a binary, grey or color image and output text. A tiff reader is built in that will read uncompressed TIFF images, or libtiff can be added to read compressed images. Here is: Readme and FAQ
Neatx - Open Source NX server
Neatx is an Open Source NX server, similar to the commercial NX server from NoMachine. For more information checkout Project Homeppage. NX protocol is way more roboust than VNC (it can be usefull when having slow Internet connection). Major differences between NX and VNC: Alternative to Google project can be FreeNx (not tested).
It is the code of the following paper: http://books.nips.cc/papers/files/nips20/NIPS2007_0435.pdf. This is an all-kernel-support version of SVM, which can parallel run on multiple machines. Here is usage.
The GO programming language
New programming language developed in Google. It is released using this slogan: "GO a systems programming language expressive, concurrent, garbage-collected"
The Google Collections Library for Java
The Google Collections Library is a set of new collection types, implementations and related goodness for Java 5 and higher, brought to you by Google. It is a natural extension of the Java Collections Framework you already know and use.
Google styleguide
Every major open-source project has its own style guide: a set of conventions (sometimes arbitrary) about how to write code for that project. It is much easier to understand a large codebase when all the code in it is in a consistent style. "Style" covers a lot of ground, from “use camelCase for variable names” to “never use global variables” to “never use exceptions.” This project holds the style guidelines we use for Google code. If you are modifying a project that originated at Google, you may be pointed to this page to see the style guides that apply to that project. This is worth reading.


Google is one of the most active companies releasing open source software, on top of that Google 5 times organized Summer Of Code - project where students from all over the world start working for OpenSource and Google pays them scholarship for few months of hard work.


Guice a lightweight dependency injection framework for Java 5 and above
Thanks JavaBeat for summary.Google Guice is a Dependency Injection Framework that can be used by Applications where Relation-ship/Dependency between Business Objects have to be maintained manually in the Application code. Since Guice support Java 5.0, it takes the benefit of Generics and Annotations thereby making the code type-safe.Documentation is here: Getting stared guide
Google Sitebrics - web framework powered by Guice
Sitebricks is a simple development layer for web applications built on top of Google Guice. Sitebricks focuses on early error detection, low-footprint code, and fast development. Like Guice, it also balances idiomatic Java with an emphasis on concise code.
Here is Getting Started guide and 5 minute tutorial.
Google ctemplate
CTemplate is a simple but powerful template language for C++. It emphasizes separating logic from presentation: it is impossible to embed application logic in this template language. Here is some documentation.

Thanks nostrademons from reddit.com
Google C++ Mocking Framework
This project was inspired by jMock, EasyMock, and Hamcrest, and designed with C++'s specifics in mind, Google C++ Mocking Framework (or Google Mock for short) is a library for writing and using C++ mock classes. Google Mock:
  • lets you create mock classes trivially using simple macros,
  • supports a rich set of matchers and actions,
  • handles unordered, partially ordered, or completely ordered expectations,
  • is extensible by users, and
  • works on Linux, Mac OS X, Windows, Windows Mobile, minGW, and Symbian.
Here is Getting Started guide, and Google C++ Mocking for dumies.

Thanks richq from reddit.com
Google C++ Testing Framework
Google's framework for writing C++ tests on a variety of platforms (Linux, Mac OS X, Windows, Cygwin, Windows CE, and Symbian). Based on the xUnit architecture. Supports automatic test discovery, a rich set of assertions, user-defined assertions, death tests, fatal and non-fatal failures, value- and type-parameterized tests, various options for running the tests, and XML test report generation. Here is Google Test Primer and here is Google Test Dev Guide.

Thanks richq from reddit.com
Google Toolbox for Mac
Is collection of source code from different Google projects, that may be useful to developers working on Macintosh. This package includes the Google Developer Spotlight Importers. The release notes can be found here.

Thanks buffi from reddit.com
This is not entirely Google Project but it is donated by Google. OCRopus(tm) is a state-of-the-art document analysis and OCR system, featuring pluggable layout analysis, pluggable character recognition, statistical natural language modelling, and multi-lingual capabilities. The OCRopus engine is based on two research projects: a high-performance handwriting recognizer developed in the mid-90's and deployed by the US Census bureau, and novel high-performance layout analysis methods. OCRopus is development is sponsored by Google and is initially intended for high-throughput, high-volume document conversion efforts. We expect that it will also be an excellent OCR system for many other applications. Here is usage guide and guide how to install development version

Thanks 13xforever from from reddit.com
Ganeti is a cluster virtual server management software tool built on top of existing virtualization technologies such as Xen or KVM and other Open Source software. Ganeti requires pre-installed virtualization software on your servers in order to function. Once installed, the tool will take over the management part of the virtual instances (Xen DomU), e.g. disk creation management, operating system installation for these instances (in co-operation with OS-specific install scripts), and startup, shutdown, failover between physical systems.

Thanks Matt Brown and btgeekboy from reddit.com
Skia is a complete 2D graphic library for drawing Text, Geometries, and Images.
  • 3x3 matrices w/ perspective
  • antialiasing, transparency, filters
  • shaders, xfermodes, maskfilters, patheffects
Projects using skia are: Android and Chrome.

Thanks zxn0 from reddit.com
Google URL parsing and canonicalization library
A small library for parsing and canonicalizing URLs. You can find README here.

Thanks pkasting
Libjingle, the Google Talk Voice and P2P Interoperability Library, is a set of components provided to interoperate with Google Talk's peer-to-peer file sharing and voice calling capabilities (in source are some samples how to build p2p app). The package includes source code for Google's implementation of Jingle and Jingle-Audio, two proposed extensions to the XMPP standard that are currently available in draft form both Windows and UNIX/Linux operating systems. Here is Developer Guide

Thanks jbking
WebDriver (Selenium)
Webdriver is sophisticated tool for automating web UI testing. It has a simple API designed to be easy to work with and can drive both real browsers, for testing javascript heavy applications, and a pure 'in memory' solution for faster testing of simpler applications. You can checkout the 5 minute introduction on GettingStarted page. Currently project is moved to http://selenium.googlecode.com/ For the latest source, please go there.

Thanks ittiam
Google Gears
Gears is an open source project that enables more powerful web applications, by adding new features to your web browser:
  • Let web applications interact naturally with your desktop
  • Store data locally in a fully-searchable database
  • Run JavaScript in the background to improve performance
Gears are the fastest way to make your web app more like desktop app

Thanks Anonymous.
Google Web Toolkit (GWT)
Google Web Toolkit (GWT) is a development toolkit for building and optimizing complex browser-based applications. GWT is used by many products at Google, including Google Wave and Google AdWords. It's open source, completely free, and used by thousands of developers around the world.

Thanks Anonymous.
Native Client
Native Client is an open-source technology for running native code in web applications, with the goal of maintaining the browser neutrality, OS portability, and safety that people expect from web apps. It has been released at an early stage to get feedback from the open-source community. Probably Native Client technology will help web developers to create richer and more dynamic browser-based applications. Native Client runs on 32-bit x86 systems that use Windows, Vista, Mac OS X, or Linux. Some ARM and x86-64 support is implemented in the source base, and we hope to make it available for application developers later this year. Here is Getting started guide and FAQ.

zxn0 and ptman from reddit.com

Currently Native Client can run Quake in your browser! :)
Google Gadgets for Linux
Google Gadgets for Linux provides a platform for running desktop gadgets under Linux, catering to the unique needs of Linux users. It's compatible with the gadgets written for Google Desktop for Windows as well as the Universal Gadgets on iGoogle. Following Linux norms, this project is open-sourced under the Apache License. Here is Getting Started Guide and instructions how to build project.

Thanks Tiger Dong
Google Caja
Caja allows websites to safely embed DHTML web applications from third parties, and enables rich interaction between the embedding page and the embedded applications.

Thanks phosphorescente from from reddit.com
Scarcity is a framework for concurrent garbage collection in C++. The framework is organized around the principle of "policy-based design", meaning that behavior are customized and extended via template parameters. Policy-based design facilitates seamless integration with a broad set of VMs and other runtime environments by allowing the host environment to replace any aspect of the framework, such as thread synchronization primitives, atomic data types, error logging facilities, tracing strategies and so on.
Google concurrency library
A concurrency library for C++. Here is getting started guide.
CppClean attempts to find problems in C++ source that slow development particularly in large code bases. It is similar to lint; however, CppClean focuses on finding global inter-module problems rather than local problems similar to other static analysis tools. The goal is to find problems that slow development in large code bases that are modified over time leaving unused code. This code can come in many forms from unused functions, methods, data members, types, etc to unnecessary #include directives. Unnecessary #includes can cause considerable extra compiles increasing the edit-compile-run cycle.

Here are some details about implementation
Unladen swallow
An optimized branch of CPython, intended to be fully compatible and significantly faster. Unladen Swallow is Google-sponsored, but not Google-owned. The engineers on the project are full-time Google engineers, but ultimately this an open-source project, not really that different from Chrome or Google Web Toolkit. Here is Getting Started Guide.

Thanks Anonymous
Closure Tools
The Closure tools help developers to build rich web applications with JavaScript that is both powerful and efficient. The Closure Compiler compiles JavaScript into compact, high-performance code. The Closure Library is a broad, well-tested, modular, and cross-browser JavaScript library. Closure Templates simplify the task of dynamically generating HTML. Here is documentation.

Thanks Anonymous
SPDY is an experiment with protocols for the web. Its goal is to reduce the latency of web pages. SPDY (pronounced "SPeeDY") is an application-layer protocol for transporting content over the web, designed specifically for minimal latency. There is SPDY-enabled Google Chrome browser and open-source web server. In lab tests, Google team had observed up to 64% reductions in page load times when using SPDY.

Thanks Anoop.

Update #2

There are a variety of C unit testing frameworks available however many of them are fairly complex and require the latest compiler technology. Some development requires the use of old compilers which makes it difficult to use some unit testing frameworks. In addition many unit testing frameworks assume the code being tested is an application or module that is targeted to the same platform that will ultimately execute the test. Because of this assumption many frameworks require the inclusion of standard C library headers in the code module being tested which may collide with the custom or incomplete implementation of the C library utilized by the code under test. Cmockery only requires a test application is linked with the standard C library which minimizes conflicts with standard C library headers. Also, Cmockery tries to avoid the use of some of the newer features of C compilers. For more information checkout manual.
Perl AppEngine
This project is to get Perl implemented as a supported language on Google App Engine. Want to support Perl? - Read Getting Started.
Perl ProtoBuf
Protocol Buffers for Perl.
Perl Sys::Protect
Perl XS module to override all "dangerous" Perl operations (any operation which interacts with the system). Notably, this module aims to provide the user with an environment identical to the restrictions in place on Google App Engine for Python.
Google App Engine
Google App Engine enables developers to build web applications on the same scalable systems that power our own applications. Google App Engine makes it easy to design scalable applications that grow from one to millions of users without infrastructure headaches. Here are some SDK Release Notes.
JRuby App Engine
JRuby on Google App Engine. With support for the Java Language, it's now possible to run Ruby code on Google App Engine. This project aims to make using JRuby as easy as any of the native App Engine languages. Although Google employees may participate in this project, the code is experimental and is not officially supported by Google.
Android Scripting
The Android Scripting Environment (ASE) brings scripting languages to Android by allowing you to edit and execute scripts and interactive interpreters directly on the Android device. These scripts have access to many of the APIs available to full-fledged Android applications, but with a greatly simplified interface. Want to know more check out FAQ
Eyes Free
Speech Enabled Eyes-Free Android Applications. The Text-To-Speech (TTS) library is allows developers to add speech to their applications. Developers give the TTS object a text string, and the TTS will take care of converting that string to text and speaking it to the user. The TTS library is designed such that different underlying speech engines can be used without affecting the higher level application logic. Currently, a port of the eSpeak engine is available. Here is Getting Started Guide
MAO - An Extensible Micro-Architectural Optimizer
This project seeks to build an infrastructure for micro-architectural optimizations at the instruction level. MAO is a stand alone tool that works on the assembly level. MAO parses the assembly file, perform all optimizations, and re-emit another assembly file. After this, the assembler can be invoked to produce a binary object. MAO reuses much of the code in the GNU Assembler (gas) and needs binutils-2.19 to build correctly. Please see the README.txt file for information on how to build and run MAO. The current MAO version is an early prototype targeting x86.
Google documentation reader
Reading web-based developer documentation is different than browsing typical web pages. As a developer, you probably refer to key technical doc many times per day, and you want it well-organized, easy to navigate, and -- above all -- fast. It works with any open source project hosted on Google Code.
SocialGraph Node Mapper
The Social Graph Node Mapper is a community project to build a portable library to map social networking sites' URLs to and from a new canonical form.
Google visualization
This library makes it easy to implement a Visualization data source so that you can easily chart or visualize your data from any of your data stores. The library implements the Google Visualization API wire protocol and query language. You therefore need write only the code required to make your data available to the library in the form of a data table. This task is made easier by the provision of abstract classes and helper functions.
This is an extension of the Torch3 Machine Learning library for handling various types of Deep Architectures and modifications to the standard Multi-layer Perceptrons:
  • Handles an arbitrary number of fully-connected sigmoidal layers
  • Unsupervised learning of MLPs using various reconstruction costs. Greedy layer-wise learning is available as well.
  • An implementation of the Stacked Denoising Autoencoders
  • A preliminary implementation of collective learning idea, whereby a pair of networks are trained in parallel and are communicating with each other.
One of Google Employees is involved in this project (it is not official Google Project). Documentation is here.
Bunny The Fuzzer
A closed loop, high-performance, general purpose protocol-blind fuzzer for C programs. Uses compiler-level integration to seamlessly inject precise and reliable instrumentation hooks into the traced program. These hooks enable the fuzzer to receive real-time feedback on changes to the function call path, call parameters, and return values in response to variations in input data. This architecture makes it possible to significantly improve the coverage of the testing process without a noticeable performance impact usually associated with other attempts to peek into run-time internals. One of Google Employees is involved in this project (it is not official Google Project). Here are some docs.
Thread weaver
Thread Weaver is a framework for writing multi-threaded unit tests in Java. It provides mechanisms for creating breakpoints within your code, and for halting execution of a thread when a breakpoint is reached. Other threads can then run while the first thread is blocked. This allows you to write repeatable tests for that can check for race conditions and thread safety. Here is user guide.
Google coredumper
A neat tool for creating GDB readable coredumps from multithreaded applications The coredumper library can be compiled into applications to create core dumps of the running program -- without terminating. It supports both single- and multi-threaded core dumps, even if the kernel does not natively support multi-threaded core files.
Rollcage API : Sandboxing for Windows
The Rollcage API can be used to sandbox an application on windows. It is primarily used by Chromium, the open source browser project behind Google Chrome. Here is design overview.
Google gtags
Server-based tags serving for large codebases. Clients in python and for emacs and vim This is an extension to GNU Emacs and X-Emacs TAGS functionality, with a server-side component that narrows down the view of a potentially large TAGS file and serves the narrowed view over the wire for better performance. An Emacs Lisp client, a python client, and vim extensions are supplied.
PP is intended to provide infrastructure and tools to describe and manipulate hardware registers and fields. Once described, it is possible to read and write fields symbolically. This allows one to browse the state of their hardware.
The iotools package provides a set of simple command line tools which allow access to hardware device registers. Supported register interfaces include PCI, IO, memory mapped IO, SMBus, CPUID, and MSR. Also included are some utilities which allow for simple arithmetic, logical, and other operations, If you ever have to debug hardware, you could probably use these tools.
The suite of fast incremental algorithms for machine learning (sofia-ml) can be used for training models for classification or ranking, using several different techniques. This release is intended to aid researchers and practitioners who require fast methods for classification and ranking on large, sparse data sets. Includes methods for learning classification and ranking models, using Pegasos SVM, SGD-SVM, ROMMA, Passive-Aggressive Perceptron, Perceptron with Margins, and Logistic Regression.
A parallel C++ implementation of fast Gibbs sampling of Latent Dirichlet Allocation
stubl - Stateless (IPv6) Tunnel Broker for LANs
Stubl is a transition mechanism for providing a basic level of IPv6 connectivity to individual nodes on a private network. All that's required is a single Linux server with an IPv6 /64 subnet routed to it. The Stubl server consists of a Linux kernel module (stubl.ko) for handling the tunnel packets, and an HTTP server (stubl_http.py) for calculating clients' addresses and providing tunnel setup instructions. The main advantage of Stubl is that it allows a user on the network, running any major OS, to get a working IPv6 connection with nothing but a few lines of shell commands. This makes it very easy for developers to start getting familiar with the protocol, with minimal administrative overhead.
dcsbwt is a data compressor program and library based on the Burrows-Wheeler transform.
DepAn: Dependency visualization and analysis
DepAn is a direct manipulation tool for visualization, analysis, and refactoring of dependencies in large applications. Chekout User Guide
Google mobwrite
MobWrite converts forms and web applications into collaborative environments. Create a simple single-user system, add one line of JavaScript, and instantly get a collaborative system.
An encoder and decoder for the format described in RFC 3284: "The VCDIFF Generic Differencing and Compression Data Format." The encoding strategy is largely based on Bentley-McIlroy 99: "Data Compression Using Long Common Strings." A library with a simple API is included, as well as a command-line executable that can apply the encoder and decoder to source, target, and delta files. A slight variation from the draft standard is defined to allow chunk-by-chunk decoding when only a partial delta file window is available.
Update Engine is a flexible Mac OS X framework that can help developers keep their products up-to-date. It can update nearly any type of software, including Cocoa apps, screen savers, and preference panes. It can even update kernel extensions, regular files, and root-owned applications. Update Engine can even update multiple products just as easily as it can update one.
Google site map generator
Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. By creating and submitting Sitemaps to search engines, you are more likely to get better freshness and coverage in search engines. Google Sitemap Generator is a tool installed on your web server to generate the Sitemaps automatically. Unlike many other third party Sitemap generation tools, Google Sitemap Generator takes a different approach: it will monitor your web server traffic, and detect updates to your website automatically.
Google Pose Optimizer
The Google pose optimizer (GPO) is a C++ library that allows reconstruction of the pose of a sensor platform (i.e. its position and orientation over time) based on information from sensors such as GPS, accelerometers and rate gyroscopes. GPO does not provide real-time localization in the way that a Kalman filter would, instead it generates the pose as a result of a large off-line optimization. This produces better results. Here is wiki.
Google dnswall
dnswall is a daemon that filters out private IP addresses in DNS responses. It is designed to be used in conjunction with an existing recursive DNS resolver in order to protect networks against DNS rebinding attacks. For details of the attack and various defenses, including dnswall, see http://crypto.stanford.edu/dns/.
Google timezone
Choose from a list of major cities around the world or define your own if it's not on the list. Set one of six layouts for your clocks and choose a design and a background for each clock independently. Add up to 15 clocks and never loose track of time again.
Radiohead ;)
Go here for details
GeN - an open-source system for learning generative models of relational data.


  1. pretiffy -> prettify

  2. Very enlightening! SpriteMe was the most interesting to me!

  3. You forgot about guice

  4. http://emend.appspot.com/sites/blog.0x1fff.com/edits/0

    “easy "CSS spirtes"” should be “easy "CSS sprites"”

  5. Why is everything in italics?

  6. Thank you for sharing. Most of them I knew about but it is nice to have one page that has them all :)

  7. These are not projects created or contributed to by Google. Just projects hosted on Google code, which any one can.

  8. @Anonymous #1 - corrected spelling error.
    @Anonymous #2 - Thanks for info about guice
    @John Tantalo - corrected spelling error (nice site for showing errors)
    @Ben - most of stuff is cited from project pages - so I have used tag cite in HTML (rest is done by CSS style - by default tag cite makes all in italics)

    @Anonymous #3 - nope, all are contributed by Googlers (Google workers) and stared at Google (sometimes it were 20% project), sometimes not ;)

  9. Don't forget slack - Slack (http://code.google.com/p/slack/) and Ganeti! (http://code.google.com/p/ganeti/)

  10. what about google gears ? http://gears.google.com/

  11. Two other Google libraries that Chromium uses are Skia ( http://code.google.com/p/skia/ ), a 2D graphics library, and GURL ( http://code.google.com/p/google-url/ ), a URL parsing/canonicalizing library.

  12. What about Google Web Toolkit?

  13. How do you feel libjingle?

  14. These Open source projects are just to attract developers to their fold from Microsoft and Apple.

  15. Please Add this also
    LotREPLs is a multi-lingual read-eval-print-loop in your browser powered by Google App Engine and the Java runtime. It's a technical demo, not something to do serious work with.

  16. A notable mention would also be Google App Engine's Python and Java SDKs, which the development environments' source code is available for scrutiny and comes with a working HTTP server. The actual runtime on Google's servers aren't open though, but quite a few projects have sprung up offering compatibility to code written with the SDK.

  17. This project is in comments because it wasn't created by Googlers.

    Is a configuration management system designed to appeal to lazy admins. It is not written by Google, but
    in README file added statement: "Googlers: gave me feature requests, bug reports".

    Thanks Matt Brown for finding this.

  18. Don't forget Google Gadgets for Linux:)

  19. http://code.google.com/p/unladen-swallow/

  20. i hate google

  21. zajebiste to, pozdrowienie z polibudy śląskiej

  22. Not a google project, but helps to create database application prototypes and then creates code for what ever.


  23. Good collection! Thanks

  24. What about Closure Tools?


  25. Isnt SPDY also a google project?


  26. google-gin GIN (GWT INjection) is Guice for Google Web Toolkit client-side code http://code.google.com/p/google-gin/

    gwt-google-apis The Official Google API Libraries for Google Web Toolkit http://code.google.com/p/gwt-google-apis/

    datanucleus-appengine DataNucleus plugin for Google App Engine http://code.google.com/p/datanucleus-appengine/

    google-web-toolkit-incubator The Official incubator of widgets and libraries for Google Web Toolkit http://code.google.com/p/google-web-toolkit-incubator/

    rietveld Code Review for Subversion, hosted on Google App Engine http://code.google.com/p/rietveld/

    youtube-direct YouTube Direct Platform http://code.google.com/p/youtube-direct

    js-test-driver Remote javascript console http://code.google.com/p/js-test-driver/

  27. How about iRedMail I LOVE that installation script as it turns a long installation and configuration process into a 2 minutes breeze!

  28. Thanks everyone for very positive feedback, and submissions of cool projects (this list is much longer now).

    @Tica2: Thanks for you list, I will merge it with mine later ;D

    @Adrian H.: This tool looks nice, but I can't see any Googlers contributing to this project - this list is only about projects created at Google or projects contributed by Google/Googlers in their "20% project".

  29. Thanks for the list. Here's a new candidate worth including in this list: Go language (http://golang.org)

  30. Google Wave Protocol
    This project contains the draft specification for the Google Wave Federation Protocol and the Java source code for the Google Wave Federation Prototype Server.

  31. pubsubhubbub : server-to-server web-hook-based pubsub (publish/subscribe) protocol as an extension to Atom and RSS.


  32. http://code.google.com/p/living-stories/ - Living Stories are a new format for presenting and consuming online news. The basic idea of a living story is to combine all of the news coverage on a running story on a single page. Every day, instead of writing a new article on the story that sits at a new URL and contains some new developments and some old background, a living story resides at a permanent URL, that is updated regularly with new developments. This makes it easier for readers to get the latest updates on the stories that interest them, as well as to review deeper background materials that are relevant for a story's context.

    To see Living Stories in action, go to http://livingstories.googlelabs.com and click on one of the listed stories

  33. http://code.google.com/p/remail-iphone/

    reMail was recently acquired by Google, and was published as open source the product. reMail downloads all your email to your iPhone and searches it instantly.

  34. http://code.google.com/p/re2/

    RE2 is a fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python. It is a C++ library. It is google related and "On large inputs, RE2 is often much faster than backtracking engines; its use of automata theory lets it apply optimizations that the others cannot." => read entry on: http://google-opensource.blogspot.com/2010/03/re2-principled-approach-to-regular.html

  35. skipfish

    A fully automated, active web application security reconnaissance tool. Key features:

    * High speed: pure C code, highly optimized HTTP handling, minimal CPU footprint - easily achieving 2000 requests per second with responsive targets.

    * Ease of use: heuristics to support a variety of quirky web frameworks and mixed-technology sites, with automatic learning capabilities, on-the-fly wordlist creation, and form autocompletion.

    * Cutting-edge security logic: high quality, low false positive, differential security checks, capable of spotting a range of subtle flaws, including blind injection vectors.

    The tool is believed to support Linux, FreeBSD, MacOS X, and Windows (Cygwin) environments.

  36. Quake 2 GWT Port


    The GWT Quake II port brings the 3d gaming experience of Quake II to the browser.

    In the port, we use WebGL, the Canvas API, HTML 5 elements, the local storage API, and WebSockets to demonstrate the possibilities of pure web applications in modern browsers such as Safari and Chrome.

    The port is based on the Jake2 project, compiled to Javascript using the Google Web Toolkit (GWT). Jake 2 is a Java port of the original Quake II source code, which was open sourced by id software.

  37. GAG - http://code.google.com/p/gag/

    Google Annotations Gallery

    The Google Annotations Gallery is an exciting new Java open source library that provides a rich set of annotations for developers to express themselves. Do you find the standard Java annotations dry and lackluster? Have you ever resorted to leaving messages to fellow developers with the @Deprecated annotation? Wouldn't you rather leave a @LOL or @Facepalm instead? If so, then this is the gallery for you.

  38. http://code.google.com/p/browserscope/

    Browserscope is an open-source project for profiling web browsers and storing and aggregating crowd-sourced data about browser performance.

    The goals are to foster innovation by tracking browser functionality and to be a resource for web developers.

  39. This comment has been removed by a blog administrator.

  40. http://www.webmproject.org/

    The WebM project is dedicated to developing a high-quality, open video format for the web that is freely available to everyone.

    The WebM launch is supported by Mozilla, Opera, Google and more than forty other publishers, software and hardware vendors.

    In this package is VP8 - a high-quality video codec that Google acquired when they purchased the company On2.

  41. http://code.google.com/p/cloudcourse/

    CloudCourse is a course scheduling system.

    Built entirely on App Engine, CloudCourse allows anyone to create and track learning activities. It also offers calendaring, waitlist management and approval features.

  42. http://code.google.com/p/mytracks/

    My Tracks records your GPS tracks and shows live statistics such as time, speed, distance, and elevation – while hiking, cycling, running or participating in other outdoor activities. Once recorded, you can share your tracks, upload them to Google Spreadsheets and visualize them on Google My Maps.


    Built on Google's App Engine, CloudCourse is a course-scheduling tool, fully integrated with Google Calendar. CloudCourse also features approval processes, wait list management, as well as room and user profile information and can be further customized to sync the data with other internal systems.

    Google hopes that by releasing this under an open-source license that it can "help developers who want to port or build enterprise applications on App Engine."


    Thoughtsite is a discussions/forum web app designed for Google App Engine. The main features of the app are:
    a flexible system that could be used for any kind of discussion forum.
    voting, tagging, comments and a reputation point system for users.
    full text search on App Engine with Apache Lucene.
    search for threads by tags or by keywords. Threads can also be linked to from user profiles.
    users gain reputation points based on community votes for their contributions.
    full-fledged user profiles with info, points, contributions, user's personal tag cloud, etc.
    basic duplication detection filters to detect similar threads so posters can avoid creating a new thread if one already exists.
    basic spam and gaming filters (self-voting, cross-voting, etc.).
    comprehensive admin section that allows moderation of individual posts and users. Users can flag objectionable content or trolls.

  43. https://groups.google.com/group/google-appengine/web/google-app-engine-open-source-projects?pli=1

    Open sourced projects for AppEngine

  44. that's a lot of list bro, this will take me time to check them all out.


  45. rietveld is code review tool for Subversion. Source code has to be hosted on Google App Engine.

    This project shows how to create a somewhat substantial web application using Django on Google App Engine.

    In addition, author hopes that he created practical tool for the Python developer community, and other open source communities.

    Some code in this project was derived from Mondrian, but this is not the full Mondrian tool.

    Project created by Guido van Rossum, Python creator and Google employee.

  46. http://code.google.com/p/szl/

    Szl is a compiler and runtime for the Sawzall language. It includes support for statistical aggregation of values read or computed from the input. Google uses Sawzall to process log data generated by Google's servers.

    Since a Sawzall program processes one record of input at a time and does not preserve any state (values of variables) between records, it is well suited for execution as the map phase of a map-reduce. The library also includes support for the statistical aggregation that would be done in the reduce phase of a map-reduce.

  47. I had no idea Google was up to all these things. Thanks for sharing the info!

  48. http://code.google.com/p/googlepersonfinder/

    googlepersonfinder - Searchable missing person database based on PFIF. An App Engine app in Python.

  49. Gulliver is an open source platform that enables users to plan trips in realtime with their friends. It powers Trippy, the Google Wave Extension built with Lonely Planet content. In the near future, we hope to enable it for mobile, iGoogle, and opensocial platforms.

  50. http://wiki.apache.org/incubator/WaveProposal some time ago: Google Wave

    Apache Wave is the project where wave technology is developed at Apache. Wave in a Box (WIAB) is the name of the main product at the moment, which is a server that hosts and federates waves, supports extensive APIs, and provides a rich web client. This project also includes an implementation of the Wave Federation protocol, to enable federated collaboration systems (such as multiple interoperable Wave In a Box instances).

  51. The CityHash family of hash functions.

    City Hash provides hash functions for strings. . The functions mix the input bits thoroughly but are not suitable for cryptography (MIT license).

  52. WebRTC is an open source project that enables web browsers with Real-Time Communications (RTC) capabilities via simple Javascript APIs. The WebRTC components have been optimized to best serve this purpose.

    Project Page

    Google code project

  53. Using Rat Proxy for ages, great tool.
    You can pretty much get a payback for the one who infected you, by knowing alot about him.

  54. Wow nice information you have shared here. Actually Google made searching of information easy on any topic. Well keep it up and post more interesting blogs.Thanks for sharing.

  55. Is this Google projects? Its really great information. Google has always gives the useful information to users. Thanks for this.

  56. LevelDB: A Fast Persistent Key-Value Store

    LevelDB is a fast key-value storage engine written at Google that provides an ordered mapping from string keys to string values. We are pleased to announce that we are open sourcing LevelDB under a BSD-style license.

    LevelDB is a C++ library that can be used in many contexts. For example, LevelDB may be used by a web browser to store a cache of recently accessed web pages, or by an operating system to store the list of installed packages and package dependencies, or by an application to store user preference settings. We designed LevelDB to also be useful as a building block for higher-level storage systems.

    LevelDB Benchmark


  57. snappy: A fast compressor/decompressor

    Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is an order of magnitude faster for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger. On a single core of a Core i7 processor in 64-bit mode, Snappy compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec or more.

    Snappy is widely used inside Google, in everything from BigTable and MapReduce to our internal RPC systems. (Snappy has previously been referred to as “Zippy” in some presentations and the likes.)

    Snappy is written in C++, but C bindings are included, and several bindings to other languages are maintained by third parties:
    * Common Lisp
    * Erlang: esnappy, snappy-erlang-nif
    * Go
    * Haskell
    * Java: JNI wrapper, native reimplementation
    * Node.js
    * Perl
    * Python
    * Ruby

  58. Google JS Test is a JavaScript unit testing framework that runs on the V8 JavaScript Engine, the same open source project that is responsible for Google Chrome’s super-fast JS execution speed. Google JS Test is used internally by several Google projects, but now it has been released as an open source project.

    Features of Google JS Test include:
    * Extremely fast startup and execution time, without needing to run a browser.

    * Clean, readable output in the case of both passing and failing tests.

    * An optional browser-based test runner that can simply be refreshed whenever JS is changed.

    * Style and semantics that resemble Google Test for C++.

    * A built-in mocking framework that requires minimal boilerplate code (e.g. no $tearDown or $verifyAll calls), with style and semantics based on the Google C++ Mocking Framework.

    * A system of matchers allowing for expressive tests and easy to read failure output, with many built-in matchers and the ability for the user to add their own.

    The trade-off is that since tests are run in V8 without a browser, there is no DOM available. You can still use Google JS Test for tests of DOM-manipulating code however

  59. Google Refine

    Google Refine is a power tool for working with messy data, cleaning it up, transforming it from one format into another, extending it with web services, and linking it to databases like Freebase.

    Shor description of project is here [PL]

  60. DART Language - page @code.google.com

    Google's alternative to JavaScript is not called Dash, the name has been changed to Dart. Designed as an object-oriented programming language that's both flexible and structured, Dart should be familiar to Java and C++ programmers, while inheriting some advantages of scripting languages like JavaScript.

    Dart is "a class-based optionally typed programming language for building web applications" and Google says that it's better suited for large-scale projects. "Developed with the goals of simplicity, efficiency, and scalability, the Dart language combines powerful new language features with familiar language constructs into a clear, readable syntax."

  61. NativeDriver

    NativeDriver is an implementation of the WebDriver API which drives the UI of a native application rather than a web application. Android version is available for download. An iPhone (iOS) / WindowsMobile version is under development and will be available soon.

    Authors plan to make NativeDriver a multi-platform tool. Current version is working on the following platforms:

    * Android - usable and in the SVN repository.
    * iOS - usable and in the SVN repository.
    * Windows - Experimental and prototype phase

  62. Qualitybots - A tool for automated comparison of website layouts across multiple Chrome versions.

    QualityBots is a tool that allows users to create batch webpage comparisons across Chrome channels by utilizing EC2 machines. The resulting comparisons will help webpage designers and Chrome app developers understand how their work will fare in future Chrome versions. In this way, one can be well prepared for change and handle it in a timely fashion.

    QualityBots utilizes both App Engine and Amazon EC2.

  63. Protocol Buffers Development Tools

    Google's Eclipse-based Development Environment for Protocol Buffers - protobuf

  64. Android Mock

    Android Mock is a framework for mocking interfaces and classes on the Dalvik VM.

    Android Mock is written on top of EasyMock 2.4, reproducing the same grammar, syntax and usage patterns. Android Mock generates mocks at Compile time which are then used at Runtime.

  65. address-sanitizer

    Address Sanitizer (ASAN) is a fast memory error detector based on compiler instrumentation. The current version is based on the LLVM compiler. It finds use-after-free and out-of-bound bugs in C/C++ programs.

    The run-time library replaces the malloc and free functions. The memory around malloc-ed regions (red zones) is poisoned. The free-ed memory is placed in quarantine and also poisoned. Every memory access in the program is transformed by the compiler.

    malloc allocates the requested amount of memory with redzones around it. The shadow values corresponding to the redzones are poisoned and the shadow values for the main memory region are cleared.

    free poisons shadow values for the entire region and puts the chunk of memory into a quarantine queue (such that this chunk will not be returned again by malloc during some period of time).

  66. Google Guava

    The Guava project contains several of Google's core libraries that Google relies on Java-based projects. There are libraries for collections, caching, primitives support, concurrency libraries, common annotations, string processing, I/O, and so forth.

    Here you can find blog post about Guava.

  67. Google Presentation Templates in HTML5 presentation based on this templates were used on Google IO 2011. Sample presentation

  68. GRR is an Incident Response Framework focused on Remote Live Forensics. Grr is used in Google to check what is installed on users computer, and which computers could be affected with an attack.
    Abstract from Paper Describing Grr:

    Remote live forensics has recently been increasingly used in order to facilitate rapid
    remote access to enterprise machines. We present the GRR Rapid Response Framework
    (GRR), a new multi-platform, open source tool for enterprise forensic investigations
    enabling remote raw disk and memory access. GRR is designed to be scalable, opening the
    door for continuous enterprise wide forensic analysis. This paper describes the architecture used by GRR and illustrates how it is used routinely to expedite enterprise forensic

  69. Google Sky Map: “window on the sky” for Android phones - open source version.

    Point your phone at the sky, and Google Sky Map will show the stars, planets, constellations, and more to help you identify the celestial objects in view. You can also browse the skies in manual mode.

  70. sfntly - A Library for Using, Editing, and Creating SFNT-based Fonts Java and C++ libraries for using, editing, and creating sfnt container based fonts (e.g. OpenType, TrueType).

    The basic features of sfntly are the reading, editing, and writing of an sfnt container font. Fonts that use an sfnt container include OpenType, TrueType, AAT/GX, and Graphite. sfntly isn't itself a tool that is usable by an end user - it is a library that allows software developers to build tools that manipulate fonts in ways that haven't been easily accessible to most developers. The sfntly library is available in Java with a partial C++ port. However, we have included some font tools that are built on top of sfntly: a font subsetter, font dumper, a font linter, some compression utilities.

  71. leak-finder-for-javascript

    Leak Finder for JavaScript works against the Developer tools remote inspecting protocol of Chrome, retrieves heap snapshots, and detects objects which are "memory leaks" according to a given leak definition.

    In JavaScript you cannot have "memory leaks" in the traditional sense, but you can have objects which are unintentionally kept alive and which in turn keep alive other objects, e.g., large parts of DOM.

  72. course-builder is experimental first step in the world of online education. It packages the software and technology used to build Power Searching with Google online course. Using this software you might want to create anything from an entire high school or university offering to a short how-to course on your favorite topic.

    Course Builder contains software and instructions for presenting your course material, which can include lessons, student activities, and assessments. It also contains instructions for using other Google products to create a course community and to evaluate the effectiveness of your course. To use Course Builder, you should have some technical skills at the level of a web master. In particular, you should have some familiarity with HTML and JavaScript.

  73. cpp-btree

    C++ B-tree is a template library that implements ordered in-memory containers based on a B-tree data structure. Similar to the STL map, set, multimap, and multiset templates, this library provides btree_map, btree_set, btree_multimap, and btree_multiset.

    C++ B-tree containers have a few advantages compared with the standard containers, which are typically implemented using Red-Black trees. Nodes in a Red-Black tree require three pointers per entry (plus 1 bit), whereas B-trees on average make use of fewer than one pointer per entry, leading to significant memory savings. For example, a set has an overhead of 16 bytes for every 4 byte set element (on a 32-bit operating system); the corresponding btree_set has an overhead of around 1 byte per set element.

    B-trees are widely known as data structures for secondary storage, because they keep disk seeks to a minimum. For an in-memory data structure, the same property yields a performance boost by keeping cache-line misses to a minimum. C++ B-tree containers make better use of the cache by performing multiple key-comparisons per node when searching the tree. Although B-tree algorithms are more complex, compared with the Red-Black tree algorithms, the improvement in cache behavior may account for a significant speedup in accessing large containers.

    The C++ B-tree containers are not without drawbacks, however. Unlike the standard STL containers, modifying a C++ B-tree container invalidates all outstanding iterators on that container. For this reason, the library also contains "safe" variations on the four containers: iterators on safe B-tree containers keep a copy of the current key and automatically reposition the iterator whenever it is used.

    Link to release information: http://google-opensource.blogspot.com/2013/01/c-containers-that-save-memory-and-time.html


Post a Comment

Popular posts from this blog

How to generate user documentation from Perl script?

Using perl to extract files from large directory structure