27 Dec 2009

35 Google open-source projects that you probably don't know

This text is translation of: 34 projekty Open Source udostępnione przez Google

Update:

Currently list is longer than 35 projects, during change from Polish to English I have added one new project - and this is why title says 35 instead 34 ;). After updates there are even more! Sorry for your confusion.

Google is one of the biggest companies supporting OpenSource movement, they released more than 500 open source projects(most of them are samples showing how to use their API). In this article I will try to write about most interesting and free releases from Google, some of them might be abandoned.

Update:

List of projects developed at Google and released as opensource (thanks @dobs from reddit) can be displayed also here

Text File processing

Google CRUSH (Custom Reporting Utilities for SHell)
CRUSH is a collection of tools for processing delimited-text data from the command line or in shell scripts. Tutorial how to use it is here

C++ libraries and sources

Google Breakpad
An open-source multi-platform crash reporting system. Breakpad is a minidump-generation library used for snapshotting processes out in the field for later analysis. The format is similar to core files but was developed by Microsoft for it's crash-uploading facility. A minidump-creation library for Mac/Linux has been implemented so that the crash-processing back-end only needs to understand one format.
Google GFlags
The gflags package contains a library that implements commandline flags processing. As such it's a replacement for getopt(). It has increased flexibility, including built-in support for C++ types like string. Here is introduction how to use it.
Google Glog
The glog library implements application-level logging. This library provides logging APIs based on C++-style streams and various helper macros. It can be used under Linux, BSD, and Windows. Here is introduction how to use Glog.
Google PerfTools
These tools are for use by developers so that they can create more robust applications. Especially of use to those developing multi-threaded applications in C++ with templates. Includes TCMalloc, heap-checker, heap-profiler and cpu-profiler. Instructions how to use PerfTools can be found here and here.
Google Sparse Hash
An extremely memory-efficient hash_map implementation. 2 bits/entry overhead. The SparseHash library contains several hash-map implementations, including implementations that optimize for space or speed. The Google sparsehash package consists of two hashtable implementations: sparse, which is designed to be very space efficient, and dense, which is designed to be very time efficient. For each one, the package provides both a hash-map and a hash-set, to mirror the classes in the common STL implementation. Docs are here.
Omaha - Google Update
Omaha, otherwise known as Google Update, is a program to install requested software and keep it up to date. So far, Omaha supports many Google products for Windows, including Google Chrome and Google Earth, but there is no reason for it to only support Google products. Here is Omaha Overview and Developers Setup Guide.
Protocol Buffers
Protocol Buffers are a way of encoding structured data in an efficient yet extensible format. Google uses Protocol Buffers for almost all of its internal RPC protocols and file formats. Here is developer guide, this protocol can be used in many languages and it is suported by few IDE - for example NetBeans

The Internet

Google Code Prettify
A Javascript module and CSS file that allows syntax highlighting of source code snippets in an html page. It supports: C/C++, Java, Python, Ruby, PHP, VisualBasic, AWK, Bash, SQL, HTML, XML, CSS, JavaScript, Makefiles and some Perl. Not supported: Smalltalk and all *CAML*. For example click here
SpriteMe - easy "CSS sprites"
SpriteMe makes it easy to create CSS sprites (connect many small images to one larger to reduce new connections to webserver when loading webpage). This projects is also available as service under: http://spriteme.org/.
Redacisaurus
Reducisaurus is a web service for minifying and serving CSS and JS files. Reducisaurus is based on YUI Compressor and runs on AppEngine.
JaikuEngine
JaikuEngine is a social microblogging platform that runs on AppEngine. JaikuEngine powers Jaiku.com. For the mobile client source, see: Jaiku Mobile client. Here is README for project
Selector Shell
The Selector Shell is a browser-based tool for testing what CSS becomes in different browsers. It works by taking some raw text, inserting a dynamic STYLE element into the HEAD with that raw text as its content, and then reading the CSSOM to see what the browser has parsed it into. It is written in Javascript. It can be tested here.
Google Feed Server
Google Feed Server is an open source Atom Publishing Protocol server based on the Apache Abdera framework. Google Feed Server provides a simple back end for data adapters, which allows developers to quickly deploy a feed for an existing data source such as a database. Google Feed Server also provides the Feed Server Client Tool (FSCT), which lets developers perform create, receive, update, and delete (CRUD) operations on a Feed Server feed. Here are links to start it up and get running.
Melange, the Spice of Creation
The goal of this project is to create a framework for representing Open Source contribution workflows, such as the existing Google Summer of Code TM (GSoC) program. Using this framework, it will be possible to host future Google Summer of Code programs (and other similar programs, such as the Google Highly Open Participation TM Contest, or GHOP) on Google App Engine. Here you can checkout Getting Started Guide
NameBench
This project hunts down the fastest DNS servers available for your computer to use. namebench runs a fair and thorough benchmark using your web browser history, tcpdump output, or standardized datasets in order to provide an individualized recommendation. namebench is completely free and does not modify your system in any way. This project began as a 20% project at Google. namebench runs on Mac OS X, Windows, and UNIX, and is available with a graphical user interface as well as a command-line interface. BTW: Google has own free public caching DNS servers at ip: 8.8.8.8 i 8.8.4.4.
Rat Proxy
A semi-automated, largely passive web application security audit tool, optimized for an accurate and sensitive detection, and automatic annotation, of potential problems and security-relevant design patterns based on the observation of existing, user-initiated traffic in complex web 2.0 environments. It detects and prioritizes broad classes of security problems, such as dynamic cross-site trust model considerations, script inclusion issues, content serving problems, insufficient XSRF and XSS defenses, and much more. Docs are here. Project is written and maintained by Michał Zalewski (lcamtuf).
TopDraw
Top Draw is an image generation program. By using simple text scripts, based on the JavaScript programming language, Top Draw can create surprisingly complex and interesting images. The cool part is that the program has built in support for taking your image and installing it as your desktop image. There's even a Viewer application that can be installed in the menubar to automatically run with the parameters (such as the selected script, update interval) that you've specified. The projects is developed in XCode, and runs on: Mac OS X 10.5 (Leopard) or later.
etherpad
Open source release of EtherPad, a web-based realtime collaborative document editor. This project exists mainly as an exhibition of the code, to help support those who want to run or modify their own etherpad servers, or for those who are curious about how etherpad's algorithms make realtime collaboration possible. Here are some instructions how to build etherpad, and screencast what is all about. Etherpad uses JavaScript, Java and Comet server for make real time collaboration make working.
Chromium
Chromium is the open-source project behind Google Chrome. Chromoium project is about create a powerful platform for developing a new generation of web applications. There are not so many differences between Chrome and Chromium. Here are instructions how to build Chromium on Linux. Tere are also official releases of Chrome for Windows, Mac and Linux.
V8 Google's open source JavaScript engine
V8 is Google's open source JavaScript engine. V8 is written in C++ and is used in Google Chrome, the open source browser from Google. V8 implements ECMAScript as specified in ECMA-262, 3rd edition, and runs on Windows XP and Vista, Mac OS X 10.5 (Leopard), and Linux systems that use IA-32 or ARM processors. V8 can run standalone, or can be embedded into any C++ application, here are some helpfull docs how to begin.
Chromium OS
Chromium OS is an open-source project that aims to build an operating system that provides a fast, simple, and more secure computing experience for people who spend most of their time on the web. Sources are available on: http://git.chromium.org/ src
Android
Android is the first free, open source, and fully customizable mobile platform. Android offers a full stack: an operating system, middleware, and key mobile applications. It also contains a rich set of APIs that allows third-party developers to develop great applications.

Tools for MySQL

Google MySQL Tools
Various tools for managing, maintaining, and improving the performance of MySQL databases, originally written by Google. This includes:
  • mypgrep.py - a tool, similar to pgrep, for managing mysql connections
  • compact_innodb.py - compacts innodb datafiles by dumping and reloading all tables
Google mMAIM
mMAIM's purpose is to make it easy to monitor and analyze MySQL servers and to easily integrate itself into any environment. It can show Master/Slave sync stats, some efficiency stats, can return statistics from most of the "show" command, and more!

Other projects

Stressful Application Test (stressapptest)
Stressful Application Test (or stressapptest, its unix name) tries to maximize randomized traffic to memory from processor and I/O, with the intent of creating a realistic high load situation in order to test the existing hardware devices in a computer. It has been used at Google for some time and now it is available under the apache 2.0 license. Here are some docs: Introduction, Installation Guide and User Guide
Pop and IMAP Troubleshooter
The POP and IMAP troubleshooter serves to diagnose and solve connection problems from client machines to email services. It reads the client configuration files (Outlook, Windows Mail, Thunderbird, etc.), checks the individual settings, and then attempts to create POP, IMAP, and SMTP connections using these settings. The troubleshooter is coded in C++ using the Qt environment. It can be used generically, or can be customized for the demands of a particular email service.
OpenDuckBill
Openduckbill is a simple command line backup tool for Linux, which is capable of monitoring the files/directories marked for backups for any changes and transferring these changes either to a local backup directory or a remote NFS exported partition or to a remote ssh server using the very common, rsync command. Here is installation guide.
ZXing
ZXing (pronounced "zebra crossing") is an open-source, multi-format 1D/2D barcode image processing library implemented in Java. Our focus is on using the built-in camera on mobile phones to photograph and decode barcodes on the device, without communicating with a server. As far I know it can be found on Android Platform. Checkout Getting stared guide, and chackout list of supported devices (My SonyEricson device is capable!).
Tesseract OCR Engine
The Tesseract OCR engine was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but it is probably one of the most accurate open source OCR engines available. The source code will read a binary, grey or color image and output text. A tiff reader is built in that will read uncompressed TIFF images, or libtiff can be added to read compressed images. Here is: Readme and FAQ
Neatx - Open Source NX server
Neatx is an Open Source NX server, similar to the commercial NX server from NoMachine. For more information checkout Project Homeppage. NX protocol is way more roboust than VNC (it can be usefull when having slow Internet connection). Major differences between NX and VNC: Alternative to Google project can be FreeNx (not tested).
PSVM
It is the code of the following paper: http://books.nips.cc/papers/files/nips20/NIPS2007_0435.pdf. This is an all-kernel-support version of SVM, which can parallel run on multiple machines. Here is usage.
The GO programming language
New programming language developed in Google. It is released using this slogan: "GO a systems programming language expressive, concurrent, garbage-collected"
The Google Collections Library for Java
The Google Collections Library is a set of new collection types, implementations and related goodness for Java 5 and higher, brought to you by Google. It is a natural extension of the Java Collections Framework you already know and use.
Google styleguide
Every major open-source project has its own style guide: a set of conventions (sometimes arbitrary) about how to write code for that project. It is much easier to understand a large codebase when all the code in it is in a consistent style. "Style" covers a lot of ground, from “use camelCase for variable names” to “never use global variables” to “never use exceptions.” This project holds the style guidelines we use for Google code. If you are modifying a project that originated at Google, you may be pointed to this page to see the style guides that apply to that project. This is worth reading.

Summary

Google is one of the most active companies releasing open source software, on top of that Google 5 times organized Summer Of Code - project where students from all over the world start working for OpenSource and Google pays them scholarship for few months of hard work.

Update

Guice a lightweight dependency injection framework for Java 5 and above
Thanks JavaBeat for summary.Google Guice is a Dependency Injection Framework that can be used by Applications where Relation-ship/Dependency between Business Objects have to be maintained manually in the Application code. Since Guice support Java 5.0, it takes the benefit of Generics and Annotations thereby making the code type-safe.Documentation is here: Getting stared guide
Google Sitebrics - web framework powered by Guice
Sitebricks is a simple development layer for web applications built on top of Google Guice. Sitebricks focuses on early error detection, low-footprint code, and fast development. Like Guice, it also balances idiomatic Java with an emphasis on concise code.
Here is Getting Started guide and 5 minute tutorial.
Google ctemplate
CTemplate is a simple but powerful template language for C++. It emphasizes separating logic from presentation: it is impossible to embed application logic in this template language. Here is some documentation.

Thanks nostrademons from reddit.com
Google C++ Mocking Framework
This project was inspired by jMock, EasyMock, and Hamcrest, and designed with C++'s specifics in mind, Google C++ Mocking Framework (or Google Mock for short) is a library for writing and using C++ mock classes. Google Mock:
  • lets you create mock classes trivially using simple macros,
  • supports a rich set of matchers and actions,
  • handles unordered, partially ordered, or completely ordered expectations,
  • is extensible by users, and
  • works on Linux, Mac OS X, Windows, Windows Mobile, minGW, and Symbian.
Here is Getting Started guide, and Google C++ Mocking for dumies.

Thanks richq from reddit.com
Google C++ Testing Framework
Google's framework for writing C++ tests on a variety of platforms (Linux, Mac OS X, Windows, Cygwin, Windows CE, and Symbian). Based on the xUnit architecture. Supports automatic test discovery, a rich set of assertions, user-defined assertions, death tests, fatal and non-fatal failures, value- and type-parameterized tests, various options for running the tests, and XML test report generation. Here is Google Test Primer and here is Google Test Dev Guide.

Thanks richq from reddit.com
Google Toolbox for Mac
Is collection of source code from different Google projects, that may be useful to developers working on Macintosh. This package includes the Google Developer Spotlight Importers. The release notes can be found here.

Thanks buffi from reddit.com
OCRopus
This is not entirely Google Project but it is donated by Google. OCRopus(tm) is a state-of-the-art document analysis and OCR system, featuring pluggable layout analysis, pluggable character recognition, statistical natural language modelling, and multi-lingual capabilities. The OCRopus engine is based on two research projects: a high-performance handwriting recognizer developed in the mid-90's and deployed by the US Census bureau, and novel high-performance layout analysis methods. OCRopus is development is sponsored by Google and is initially intended for high-throughput, high-volume document conversion efforts. We expect that it will also be an excellent OCR system for many other applications. Here is usage guide and guide how to install development version

Thanks 13xforever from from reddit.com
Ganeti
Ganeti is a cluster virtual server management software tool built on top of existing virtualization technologies such as Xen or KVM and other Open Source software. Ganeti requires pre-installed virtualization software on your servers in order to function. Once installed, the tool will take over the management part of the virtual instances (Xen DomU), e.g. disk creation management, operating system installation for these instances (in co-operation with OS-specific install scripts), and startup, shutdown, failover between physical systems.

Thanks Matt Brown and btgeekboy from reddit.com
skia
Skia is a complete 2D graphic library for drawing Text, Geometries, and Images.
  • 3x3 matrices w/ perspective
  • antialiasing, transparency, filters
  • shaders, xfermodes, maskfilters, patheffects
Projects using skia are: Android and Chrome.

Thanks zxn0 from reddit.com
Google URL parsing and canonicalization library
A small library for parsing and canonicalizing URLs. You can find README here.

Thanks pkasting
libjingle
Libjingle, the Google Talk Voice and P2P Interoperability Library, is a set of components provided to interoperate with Google Talk's peer-to-peer file sharing and voice calling capabilities (in source are some samples how to build p2p app). The package includes source code for Google's implementation of Jingle and Jingle-Audio, two proposed extensions to the XMPP standard that are currently available in draft form both Windows and UNIX/Linux operating systems. Here is Developer Guide

Thanks jbking
WebDriver (Selenium)
Webdriver is sophisticated tool for automating web UI testing. It has a simple API designed to be easy to work with and can drive both real browsers, for testing javascript heavy applications, and a pure 'in memory' solution for faster testing of simpler applications. You can checkout the 5 minute introduction on GettingStarted page. Currently project is moved to http://selenium.googlecode.com/ For the latest source, please go there.

Thanks ittiam
Google Gears
Gears is an open source project that enables more powerful web applications, by adding new features to your web browser:
  • Let web applications interact naturally with your desktop
  • Store data locally in a fully-searchable database
  • Run JavaScript in the background to improve performance
Gears are the fastest way to make your web app more like desktop app

Thanks Anonymous.
Google Web Toolkit (GWT)
Google Web Toolkit (GWT) is a development toolkit for building and optimizing complex browser-based applications. GWT is used by many products at Google, including Google Wave and Google AdWords. It's open source, completely free, and used by thousands of developers around the world.

Thanks Anonymous.
Native Client
Native Client is an open-source technology for running native code in web applications, with the goal of maintaining the browser neutrality, OS portability, and safety that people expect from web apps. It has been released at an early stage to get feedback from the open-source community. Probably Native Client technology will help web developers to create richer and more dynamic browser-based applications. Native Client runs on 32-bit x86 systems that use Windows, Vista, Mac OS X, or Linux. Some ARM and x86-64 support is implemented in the source base, and we hope to make it available for application developers later this year. Here is Getting started guide and FAQ.

zxn0 and ptman from reddit.com

Currently Native Client can run Quake in your browser! :)
Google Gadgets for Linux
Google Gadgets for Linux provides a platform for running desktop gadgets under Linux, catering to the unique needs of Linux users. It's compatible with the gadgets written for Google Desktop for Windows as well as the Universal Gadgets on iGoogle. Following Linux norms, this project is open-sourced under the Apache License. Here is Getting Started Guide and instructions how to build project.

Thanks Tiger Dong
Google Caja
Caja allows websites to safely embed DHTML web applications from third parties, and enables rich interaction between the embedding page and the embedded applications.

Thanks phosphorescente from from reddit.com
scarcity
Scarcity is a framework for concurrent garbage collection in C++. The framework is organized around the principle of "policy-based design", meaning that behavior are customized and extended via template parameters. Policy-based design facilitates seamless integration with a broad set of VMs and other runtime environments by allowing the host environment to replace any aspect of the framework, such as thread synchronization primitives, atomic data types, error logging facilities, tracing strategies and so on.
Google concurrency library
A concurrency library for C++. Here is getting started guide.
Cppclean
CppClean attempts to find problems in C++ source that slow development particularly in large code bases. It is similar to lint; however, CppClean focuses on finding global inter-module problems rather than local problems similar to other static analysis tools. The goal is to find problems that slow development in large code bases that are modified over time leaving unused code. This code can come in many forms from unused functions, methods, data members, types, etc to unnecessary #include directives. Unnecessary #includes can cause considerable extra compiles increasing the edit-compile-run cycle.

Here are some details about implementation
Unladen swallow
An optimized branch of CPython, intended to be fully compatible and significantly faster. Unladen Swallow is Google-sponsored, but not Google-owned. The engineers on the project are full-time Google engineers, but ultimately this an open-source project, not really that different from Chrome or Google Web Toolkit. Here is Getting Started Guide.

Thanks Anonymous
Closure Tools
The Closure tools help developers to build rich web applications with JavaScript that is both powerful and efficient. The Closure Compiler compiles JavaScript into compact, high-performance code. The Closure Library is a broad, well-tested, modular, and cross-browser JavaScript library. Closure Templates simplify the task of dynamically generating HTML. Here is documentation.

Thanks Anonymous
SPDY
SPDY is an experiment with protocols for the web. Its goal is to reduce the latency of web pages. SPDY (pronounced "SPeeDY") is an application-layer protocol for transporting content over the web, designed specifically for minimal latency. There is SPDY-enabled Google Chrome browser and open-source web server. In lab tests, Google team had observed up to 64% reductions in page load times when using SPDY.

Thanks Anoop.

Update #2

cmockery
There are a variety of C unit testing frameworks available however many of them are fairly complex and require the latest compiler technology. Some development requires the use of old compilers which makes it difficult to use some unit testing frameworks. In addition many unit testing frameworks assume the code being tested is an application or module that is targeted to the same platform that will ultimately execute the test. Because of this assumption many frameworks require the inclusion of standard C library headers in the code module being tested which may collide with the custom or incomplete implementation of the C library utilized by the code under test. Cmockery only requires a test application is linked with the standard C library which minimizes conflicts with standard C library headers. Also, Cmockery tries to avoid the use of some of the newer features of C compilers. For more information checkout manual.
Perl AppEngine
This project is to get Perl implemented as a supported language on Google App Engine. Want to support Perl? - Read Getting Started.
Perl ProtoBuf
Protocol Buffers for Perl.
Perl Sys::Protect
Perl XS module to override all "dangerous" Perl operations (any operation which interacts with the system). Notably, this module aims to provide the user with an environment identical to the restrictions in place on Google App Engine for Python.
Google App Engine
Google App Engine enables developers to build web applications on the same scalable systems that power our own applications. Google App Engine makes it easy to design scalable applications that grow from one to millions of users without infrastructure headaches. Here are some SDK Release Notes.
JRuby App Engine
JRuby on Google App Engine. With support for the Java Language, it's now possible to run Ruby code on Google App Engine. This project aims to make using JRuby as easy as any of the native App Engine languages. Although Google employees may participate in this project, the code is experimental and is not officially supported by Google.
Android Scripting
The Android Scripting Environment (ASE) brings scripting languages to Android by allowing you to edit and execute scripts and interactive interpreters directly on the Android device. These scripts have access to many of the APIs available to full-fledged Android applications, but with a greatly simplified interface. Want to know more check out FAQ
Eyes Free
Speech Enabled Eyes-Free Android Applications. The Text-To-Speech (TTS) library is allows developers to add speech to their applications. Developers give the TTS object a text string, and the TTS will take care of converting that string to text and speaking it to the user. The TTS library is designed such that different underlying speech engines can be used without affecting the higher level application logic. Currently, a port of the eSpeak engine is available. Here is Getting Started Guide
MAO - An Extensible Micro-Architectural Optimizer
This project seeks to build an infrastructure for micro-architectural optimizations at the instruction level. MAO is a stand alone tool that works on the assembly level. MAO parses the assembly file, perform all optimizations, and re-emit another assembly file. After this, the assembler can be invoked to produce a binary object. MAO reuses much of the code in the GNU Assembler (gas) and needs binutils-2.19 to build correctly. Please see the README.txt file for information on how to build and run MAO. The current MAO version is an early prototype targeting x86.
Google documentation reader
Reading web-based developer documentation is different than browsing typical web pages. As a developer, you probably refer to key technical doc many times per day, and you want it well-organized, easy to navigate, and -- above all -- fast. It works with any open source project hosted on Google Code.
SocialGraph Node Mapper
The Social Graph Node Mapper is a community project to build a portable library to map social networking sites' URLs to and from a new canonical form.
Google visualization
This library makes it easy to implement a Visualization data source so that you can easily chart or visualize your data from any of your data stores. The library implements the Google Visualization API wire protocol and query language. You therefore need write only the code required to make your data available to the library in the form of a data table. This task is made easier by the provision of abstract classes and helper functions.
deeptorch
This is an extension of the Torch3 Machine Learning library for handling various types of Deep Architectures and modifications to the standard Multi-layer Perceptrons:
  • Handles an arbitrary number of fully-connected sigmoidal layers
  • Unsupervised learning of MLPs using various reconstruction costs. Greedy layer-wise learning is available as well.
  • An implementation of the Stacked Denoising Autoencoders
  • A preliminary implementation of collective learning idea, whereby a pair of networks are trained in parallel and are communicating with each other.
One of Google Employees is involved in this project (it is not official Google Project). Documentation is here.
Bunny The Fuzzer
A closed loop, high-performance, general purpose protocol-blind fuzzer for C programs. Uses compiler-level integration to seamlessly inject precise and reliable instrumentation hooks into the traced program. These hooks enable the fuzzer to receive real-time feedback on changes to the function call path, call parameters, and return values in response to variations in input data. This architecture makes it possible to significantly improve the coverage of the testing process without a noticeable performance impact usually associated with other attempts to peek into run-time internals. One of Google Employees is involved in this project (it is not official Google Project). Here are some docs.
Thread weaver
Thread Weaver is a framework for writing multi-threaded unit tests in Java. It provides mechanisms for creating breakpoints within your code, and for halting execution of a thread when a breakpoint is reached. Other threads can then run while the first thread is blocked. This allows you to write repeatable tests for that can check for race conditions and thread safety. Here is user guide.
Google coredumper
A neat tool for creating GDB readable coredumps from multithreaded applications The coredumper library can be compiled into applications to create core dumps of the running program -- without terminating. It supports both single- and multi-threaded core dumps, even if the kernel does not natively support multi-threaded core files.
Rollcage API : Sandboxing for Windows
The Rollcage API can be used to sandbox an application on windows. It is primarily used by Chromium, the open source browser project behind Google Chrome. Here is design overview.
Google gtags
Server-based tags serving for large codebases. Clients in python and for emacs and vim This is an extension to GNU Emacs and X-Emacs TAGS functionality, with a server-side component that narrows down the view of a potentially large TAGS file and serves the narrowed view over the wire for better performance. An Emacs Lisp client, a python client, and vim extensions are supplied.
Prettyprint
PP is intended to provide infrastructure and tools to describe and manipulate hardware registers and fields. Once described, it is possible to read and write fields symbolically. This allows one to browse the state of their hardware.
iotools
The iotools package provides a set of simple command line tools which allow access to hardware device registers. Supported register interfaces include PCI, IO, memory mapped IO, SMBus, CPUID, and MSR. Also included are some utilities which allow for simple arithmetic, logical, and other operations, If you ever have to debug hardware, you could probably use these tools.
sofia-ml
The suite of fast incremental algorithms for machine learning (sofia-ml) can be used for training models for classification or ranking, using several different techniques. This release is intended to aid researchers and practitioners who require fast methods for classification and ranking on large, sparse data sets. Includes methods for learning classification and ranking models, using Pegasos SVM, SGD-SVM, ROMMA, Passive-Aggressive Perceptron, Perceptron with Margins, and Logistic Regression.
plda
A parallel C++ implementation of fast Gibbs sampling of Latent Dirichlet Allocation
stubl - Stateless (IPv6) Tunnel Broker for LANs
Stubl is a transition mechanism for providing a basic level of IPv6 connectivity to individual nodes on a private network. All that's required is a single Linux server with an IPv6 /64 subnet routed to it. The Stubl server consists of a Linux kernel module (stubl.ko) for handling the tunnel packets, and an HTTP server (stubl_http.py) for calculating clients' addresses and providing tunnel setup instructions. The main advantage of Stubl is that it allows a user on the network, running any major OS, to get a working IPv6 connection with nothing but a few lines of shell commands. This makes it very easy for developers to start getting familiar with the protocol, with minimal administrative overhead.
dcs-bwt-compressor
dcsbwt is a data compressor program and library based on the Burrows-Wheeler transform.
DepAn: Dependency visualization and analysis
DepAn is a direct manipulation tool for visualization, analysis, and refactoring of dependencies in large applications. Chekout User Guide
Google mobwrite
MobWrite converts forms and web applications into collaborative environments. Create a simple single-user system, add one line of JavaScript, and instantly get a collaborative system.
open-vcdiff
An encoder and decoder for the format described in RFC 3284: "The VCDIFF Generic Differencing and Compression Data Format." The encoding strategy is largely based on Bentley-McIlroy 99: "Data Compression Using Long Common Strings." A library with a simple API is included, as well as a command-line executable that can apply the encoder and decoder to source, target, and delta files. A slight variation from the draft standard is defined to allow chunk-by-chunk decoding when only a partial delta file window is available.
update-engine
Update Engine is a flexible Mac OS X framework that can help developers keep their products up-to-date. It can update nearly any type of software, including Cocoa apps, screen savers, and preference panes. It can even update kernel extensions, regular files, and root-owned applications. Update Engine can even update multiple products just as easily as it can update one.
Google site map generator
Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. By creating and submitting Sitemaps to search engines, you are more likely to get better freshness and coverage in search engines. Google Sitemap Generator is a tool installed on your web server to generate the Sitemaps automatically. Unlike many other third party Sitemap generation tools, Google Sitemap Generator takes a different approach: it will monitor your web server traffic, and detect updates to your website automatically.
Google Pose Optimizer
The Google pose optimizer (GPO) is a C++ library that allows reconstruction of the pose of a sensor platform (i.e. its position and orientation over time) based on information from sensors such as GPS, accelerometers and rate gyroscopes. GPO does not provide real-time localization in the way that a Kalman filter would, instead it generates the pose as a result of a large off-line optimization. This produces better results. Here is wiki.
Google dnswall
dnswall is a daemon that filters out private IP addresses in DNS responses. It is designed to be used in conjunction with an existing recursive DNS resolver in order to protect networks against DNS rebinding attacks. For details of the attack and various defenses, including dnswall, see http://crypto.stanford.edu/dns/.
Google timezone
Choose from a list of major cities around the world or define your own if it's not on the list. Set one of six layouts for your clocks and choose a design and a background for each clock independently. Add up to 15 clocks and never loose track of time again.
Radiohead ;)
Go here for details
GeN
GeN - an open-source system for learning generative models of relational data.

26 Dec 2009

34 projekty Open Source udostępnione przez Google

Google jest jedną z największych firm wspierający ruch wolnego oprogramowania, Gigant z Mountain View w sumie wypuścił ponad 500 projektów jako OpenSource, postaram się przedstawić listę tylko tych ciekawszych, jakie zostały upublicznione.

Przetwarzanie plików tekstowych

Google CRUSH (Custom Reporting Utilities for SHell)
Jest to kolekcja narzędzi przeznaczonych do pracy na plikach TSV/CSV praca z plikami może odbywać się z linii komend oraz plików shellowych

Biblioteki i źródła C++

Google Breakpad
Jest otwarto źródłowym systemem do diagnozowania usterek w oprogramowaniu (crash reporting system).
Google GFlags
GFlags jest biblioteką pozwalającą na przetwarzanie argumentów linii komend. Można powiedzieć, że jest to zastępstwo dla funkcji getopt(), jednakże znacznie zwiększono w niej elastyczność oraz dodano obsługę typów znanych z C++ takich jak string.
Google Glog
Biblioteka Glog pozwala na logowanie działania aplikacji poprzez wygodny interfejs bazujący na potokach (streams). Glog udostepnia też wiele gotowych makro definicji, które można wykorzystać w oprogramowaniu podczas jego debugowania.
Google PerfTools
Narzędzia GooGle PerfTools zostały stworzone dla programistów tak by mogli tworzyć lepsze i solidniejsze aplikacje. Narzędzia te mogą się przydać szczególnie przy budowaniu aplikacji wielowątkowych w języku C++ przy wykorzystaniu mechanizmu szablonów (templates). Projekt zawiera heap-checker, heap-profiler i cpu-profiler.
Google Sparse Hash
Google Sparse Hash jest zoptymalizowaną pod kątem zajętości pamięci implementacja hash mapy.
Omaha - Google Update
Omaha, szerzej znana jako Google Update jest to narzędzie monitorujące zainstalowane oprogramowanie pod kątem aktualności. Do tej pory Omaha jest wykorzystywana w produktach Google na platformę Windows (projekty takie jak Google Chrome i Google Earth), ale Omaha może być też użyta w oprogramowaniu firm trzecich.
Protocol Buffers
Protocol Buffers jest protokół kodowania danych strukturalnych i przygotowanie ich do przesłania w sieci. Sam format jest łatwo rozszerzalny, a jednocześnie bardzo wydajny. Google używa Protocol Buffers praktycznie we wszystkich wewnętrznych usługach RPC. Format ten jest także obsługiwany przez środowisko NetBeans (istnieje plugin wspomagający tworzenie Protocol Buffers).

Sieć Internet

Google Code Pretiffy
Jest to moduł JavaScript oraz plik CSS pozwalający na podświetlanie składni kawałków kodu źródłowego na stronie www. Lexer pozwala na przetwarzanie i kolorowanie składni języków takich jak: C oraz pochodne, Java, Python, Ruby, PHP, VisualBasic, AWK, Bash, SQL, HTML, XML, CSS, Javascript oraz tekstu plików Makefile oraz na sporej części skryptów Perla. Nie są obsługiwane języki: Smalltalk, oraz wszystkie pochodne CAML.
SpriteMe - czyli tworzenie "CSS spirtes"
SpriteMe pozwala na bardzo łatwe tworzenie CSS sprites (połączenia wielu małych plików w jeden obraz, a następnie wycinanie poszczególnych obrazków i osadzanie ich na stronie za pomocą CSS juz po stronie klienta), taka optymalizacja minimalizuje ilość odwołań przeglądarki do serwera www niezbędnych do załadowania całej strony - przyspieszając czas ładowania strony. Usługa jest dostępna także pod adresem: http://spriteme.org/.
Redacisaurus
Reducisaurus jest usługą pozwalającą na zmniejszenie oraz serwowanie plików CSS i JavaScript. Cała usługa jest oparta na Systemie kompresji YUI i działa na platformie AppEngine.
JaikuEngine
JaikuEngine jest usługą mikroblogową działającą na platformie AppEngine. JaikuEngine napędza serwis Jaiku.com. Istnieje także mobilna wersja klienta.
Selector Shell
Pozwala na stworzenie "powłoki" wewnątrz przeglądarki tak by możliwe było testowanie selektorów CSS.
Google Feed Server
Google Feed Server jest otwarto źródłową implementacją serwera Atom Publishing Protocol, serwer ten bazuje na frameworku Apache Abdera. Google Feed Server dostarcza prostego back-endu dla adapterów danych, które umożliwiają programistom szybkie stworzenie kanału Atom z dostępnych danych - takich jak baza danych.
NameBench
NameBench pozwala na sprawdzenie prędkości różnych serwerów DNS, dane do testów aplikacja może pobrać z historii przeglądarki, zrzutów tcpdumpa lub standardowych zbiorów danych. Namebench jest całkowicie darmowy i nie modyfikuje systemu w żaden sposób. Projekt ten został rozpoczęty w ramach 20% czasu na własne projekty w Google. Dobrze jest przy jet okazji wspomnieć, że firma Google udostępnia własne serwery DNS (cachujące) są one dostępne pod adresami ip: 8.8.8.8 i 8.8.4.4.
Rat Proxy
RatProxy jest półautomatycznym, pasywnym narzędziem do badania bezpieczeństwa usług internetowych. Narzędzie to zostało zoptymalizowane do wykrywania i automatycznego kategoryzowania potencjalnych problemów związanych z bezpieczeństwem, poprzez obserwację ruchu generowanego przez użytkowników. Narzędzie to powstało by ułatwić analizę bezpieczeństwa serwisu w skomplikowanych środowiskach web 2.0.
TopDraw
TopDraw jest programem do generowania obrazków, poprzez ich opis w języku podobnym do JavaScriptu. TopDraw może stworzyć bardzo zaawansowane i interesujące kompozycje z obrazów. Najfajniejszą częścią projektu jest to, że posiada wbudowany mechanizm tworzenia obrazów oraz instalowania ich jako tapeta. W pakiecie jest także przeglądarka która może być zainstalowana w pasku narzędzi i może uruchamiać skrypt generujący obraz co określony interwał czasowy.
etherpad
EtherPad jest internetowym edytorem tekstu pozwalającym na pracę nad jednym dokumentem do ośmiu osób w czasie rzeczywistym, każda z osób może edytować dokument w tym samym czasie i podejrzeć wszystkie zmiany innych uczestników (każdy z użytkowników ma swój kolor - zobacz screencast). Uczestnicy mogą zapisać zmiany w dokumencie w każdej chwili. Aplikacja została stworzona przez firmę AppJet i działa w oparciu o JavaScript, Javę, serwer Comet.
Chromium
Chromium jest otwarto źródłowym projektem przeglądarki Google Chrome (od wersji sygnowanej przez Google niewiele się różni tak naprawdę).
V8 Google's open source JavaScript engine
V8 jest interpreterem języka JavaScript napisanym całkowicie w C++ jest on wykorzystywany w Google Chrome. V8 obsługuje ECMAScript (czyli JavaScript) wg. specyfikacji ECMA-262 (edycja: 3). Działa poprawnie pod Windows XP i Vista, Mac OS X 10.5 (Leopard) oraz Linux na architekturze IA-32 i ARM.
Chromium OS
Celem projektu jest zbudowanie systemu operacyjnego, który dostarcza użytkownikowi szybkiej, prostej i bezpiecznej platformy do przeglądania i tworzenia materiałów w sieci internet. Chromium OS jest projektem dostępnym wraz z kodem źródłowym, a jego źródła są dostępne pod adresem: http://git.chromium.org/ src
Android
Android jest pierwszą darmową i w pełni konfigurowalną platformą mobilną o otwartym źródle. Android oferuje pełen stos rozwiązań: system operacyjny, middleware, oraz podstawowe aplikacje mobilne. Platforma Android zawiera bogaty zbiór różnych API, pozwalając programistom tworzyć ciekawe aplikacje, które mogą się integrować z systemem operacyjnym na urządzeniu mobilnym.

Narzędzia do obsługi serwerów bazodanowych - MySQL

Google MySQL Tools
Google mMAIM
Celem narzędzia mMAIM jest monitorowane i analiza serwerów bazodanowych opartych na MySQL. Narzędzie to łatwo może być zintegrowane z każdym środowiskiem w którym działają bazy MySQL. Pozwala na wyświetlenie stanu replikacji Master/Slave, wyświetlanie statystyk całej bazy, wyświetlenie statystyk ze wszystkich komend typu 'SHOW' oraz wiele więcej. Pakiet ten zawiera wiele narzędzi do zarządzania, monitorowania i zwiększania wydajności baz danych opartych o MySQL, oryginalnie projekt był tworzony przez Google.

Inne

Stressful Application Test (stressapptest)
Stressful Application Test (w Unixie: stressapptest) jest to narzędzie pozwalające na wytworzenie sytuacji w której komputer jest poddawany dużym obciążeniom by sprawdzić jak zachowują się poszczególne części zestawu komputerowego. Narzędzie generuje przepływ danych z procesora do pamięci, oraz sporą ilość operacji wejścia/wyjścia. Jest ono używane w Google, a obecnie jest dostępne jako projekt open source na licencji Apache 2.0.
Pop and IMAP Troubleshooter
POP and IMAP troubleshooter pozwala na zdiagnozowanie i rozwiązanie problemów z połączeniem do serwerów poczty elektronicznej z komputerów klientów przez protokół POP3 i IMAP. Program ten może odczytywać pliki konfiguracyjne klientów pocztowych (Outlook, Windows Mail, Thunderbird, itp.), sprawdzać poszczególne ustawienia, a następnie spróbować wykonać połączenia POP, IMAP, SMTP używając tych ustawień.
OpenDuckBill
Openduckbill jest programem konsolowym do backupu danych w systemu Linux. Pozwala on na monitorowanie zmian w plikach i katalogach oznaczonych jako "backup", oraz ich synchronizację do lokalnego katalogu backupu, zdalnego udziału NFS lub przesłanie ich na serwer z wykorzystaniem komendy rsync.
ZXing
ZXing ("zebra crossing") jest biblioteką służącą do rozpoznawanie kodów kreskowych 1D i 2D. Biblioteka jest dostępna wraz z kodem źródłowym, obsługuje wiele typów obrazów, została ona stworzona w Javie, jej głównym celem jest udostępnianie możliwości przetwarzania kodów kreskowych bez komunikacji z serwerem na urządzeniach moblinych takich jak telefony komórkowe. O ile się nie mylę jest ona wykorzystana w Platformie Android.
Tesseract OCR Engine
Silnik rozpoznawania tekstu Tesseract był jednym z 3 najlepszych w 1995 roku wg. testu dokładności UNLV. Pomiędzy rokiem 1995, a 2006 nie było w nim wiele modyfikacji, ale pomimo tego najprawdopodobniej jest on jednym z najdokładniejszych systemów rozpoznawania tekstu wydanych jako open-source. Kod źródłowy pozwala na odczyt i przetworzenie danych zapisanych w postaci binarnej - obrazy w odcieniach szarości lub kolorowe mogą być przetworzone na tekst. Do projektu dołączony jest narzędzie odczytujące nieskompresowane obrazy w formacie TIFF.
Neatx - Open Source NX server
Neatx jest projektem Open Source podobnym do serwera NX firmy NoMachine. Protokół NX wydajnością bije na głowę VNC, co przy niezbyt szybkim łączu jest bardzo korzystne. Główne różnice pomiędzy NX, a VNC:
  • NX jest klientem X11 a nie przesyła obrazy jak VNC
  • NX działa z X, VNC i Remote Desktop (Windows)
  • NX buforuje dane
  • NX jest prostszy w konfiguracji
Alternatywnym projektem może być FreeNx
PSVM
Jest to wersja maszyny SVM, która może być uruchomiona równolegle na wielu maszynach. Szczegóły zostały opisane w artykule.
The GO programming language
Nowy język programowania stworzony w Google. Składnia języka jest podobna do C i Pythona.
Google styleguide
Zbiór reguł wg. których jest pisany kod aplikacji w Google.

Podsumowanie

Google jest jedną z najbardziej aktywnych firm wspierających ruch wolnego oprogramowania, publikując część swoich projektów za darmo w sieci, co więcej każdego roku organizuje Summer Of Code - projekt wspierania Otwartego Oprogramowania, w ramach którego studenci realizują różne zadania na rzecz już istniejących projektów Open Source i otrzymują za to stypendium ufundowane przez Google.

24 Dec 2009

Microsoft LifeCAM NX-3000 on Linux and Skype

Microsoft LifeCAM NX-3000 is nice little web cam, which works on GNU Linux (this is strange). Some people have problems with this webcam and Skype.

How to solve problems with Skype

  • Download newest version of Skype
  • Install it dpkg --force-all -i skype-*.deb
  • Configure pulseaudio (using GUI)

    Go to Audio settings

    Check if you have web cam detected (if not install additional kernel modules)

    Change sound input device

  • Start Skype for Linux, and start chatting with friends
PulseAudio manual, some helpful advices but running this was very simple.

Other links

Summary

New version of Skype for Linux is finally ok, my PC during Skype conversation is finally quiet and I've got many resources to use (Skype is evolving in good direction).

23 Dec 2009

Manipulacja tekstem w Bashu

Bash jest całkiem dobrym narzędziem jeżeli chodzi o manipulację tekstem (oczywiście nie może się umywać do Perla/Seda/AWKa/Pythona), ale sporo funkcji ma zaimplementowanych, wiele osób nawet sobie nie zdaje z tego sprawy. W tej notce spróbuję część tej funkcjonalności przedstawić:

19 Dec 2009

BitLocker without TPM Module in Windows7

Windows BitLocker can store "password to disk" on USB stick, not only in TPM hardware module. To make it happen you have to activate some advanced settings (why there are no dialog like: "save my key on usb disk"?)

How to save Windows7 BitLocker key on USB stick?

  • Click: Start | Search, type gpedit.msc and hit enter
  • Navigate to:
    • Local Computer Policy
    • + Computer Configuration
    • ++ Administrative Templates
    • +++ Windows Components
    • ++++ Operating Systems Drives
    • +++++ BitLocker Drive Encryption -> Require Additional Authentication at Startup
  • Change those two keys to true
  • Rerun the BitLocker Wizard

Once you have allowed BitLocker without TPM, the wizard in the BitLocker Drive Preparation will let you store the Startup Key on a USB flash drive. It also allows you to save a Recovery Key, which you will need if you have lost your USB stick.

You will then be asked whether you want to run a BitLocker System Check. If you agree, your computer will be restarted to check whether the USB device is available during the boot-up process (that is nice idea).

This super mini howto was based on: Windows7 BitLocker Review, and it is posted mostly for me (I don't remember the path in gpedit.msc :().

Notes

There are also other cross platform ways to secure your data, one of them is TrueCrypt, which can be compared with BitLocker, if you're interested how it really work you can read article: How does TrueCrypt work - explained.

Additional links

Links below are not connected with BitLocker but I think it may be useful for me someday:

14 Dec 2009

Pomysł na serwer plików

Od pewnego czasu mogę w ciągu dnia korzystać z macierzy dyskowej (NAS) podłączonej do sieci Ethernet, całość to zamknięte w bardzo małej obudowie dwa dyski spięte w RAID1 :). Powiem że takie rozwiązanie jest bardzo wygodne do współdzielenia plików pomiędzy paroma komputerami (polecam!), lub po prostu jako miejsce przechowywania dużych ilości danych (backup, storage). Macierz z jakiej korzystam ma sporo różnych funkcji :), ale w sumie pomyślałem, że spiszę tutaj te które bym chciał mieć w takim urządzeniu.

13 Dec 2009

Electronic devices - rapid prototyping environments

There are many platforms that can be used for rapid prototyping of electronics devices, in this article I will write about two of them which are quite popular.

Arduinio

Arduino is an open-source electronics prototyping platform based on flexible, easy-to-use hardware and software. It's intended for artists, designers, hobbyists, and anyone interested in creating interactive objects or environments.
What is inside Arduino?
version 2009
  • ATmega168/ATmega328
  • 16 KB Flash Memory (ATmega168)/32 KB Flash Memory (ATmega328) (2 KB are used by bootloader)
  • 1 KB SRAM(ATmega168)/2 KB SRAM (ATmega328)
  • 512 bytes EEPROM (ATmega168)/1 KB EEPROM (ATmega328)
  • 14 digital input/output pins (of which 6 can be used as PWM outputs)
  • 6 analog inputs
  • 16 MHz crystal oscillator
  • USB connection
  • power jack
  • built-in LED
  • ICSP header
  • reset button
  • I2C support
Some helpful links on Arduino
Inspirations for cool projects
Projects listed above, were made using ATmega32:

Sun Spot

Project Sun SPOT (Small Programmable Object Technology) was created to encourage the development of new applications and devices. It is designed from the ground up to allow programmers who never before worked with embedded devices to think beyond the keyboard, mouse and screen and write programs that interact with each other, the environment and their users in completely new ways. A Java programmer can use standard Java development tools such as NetBeans to write code.
What is Sun Spot
Read: What is SunSPOT - Introduction, it has many nice features
  • Embedded Development Platform
  • Easy to program - Java top to bottom
  • It has Wireless Communication (Overlay Network - CTP, IPv6/LowPan ; Mesh Networking - AODV, LQRP) ; Multi-hop Over the Air Programming
  • Built in Lithium Ion battery charged through USB
In Kit there are two SUN Spots + base station (base station is only processor board without sensors).
What is inside
  • 180 MHz 32 bit ARM920T z 512K RAM SRAM i 4M Flash.
  • 2.4 GHz IEEE 802.15.4 support
  • USB interface
  • light sensor ; temperature sensor ; 8 colour leds ; inputs/outputs ; ADC ; 2 buttons ; accelerometer
It is based on open hardware and schematics. Schematics can be downloaded from here, and software from here. Sun Spot runs Squawk VM, software is written in Java (there is special version of NetBeans IDE) - checkout sources to get some details. There is some nice tutorial: how to use emulator.
Sample projects based on Sun Spot

Conclusion

Arduino
  • Great for start - cheap (remember you can damage device during development process)
  • A lot of tutorials, references, sample projects
  • Many people use this!
  • Some SDK is provided
Sun Spot
  • Sun Spot is more powerful out of the box
  • Perfect for creating mesh networks, sensor network or something like that
  • Programmable in Java - you can use NetBeans
  • It is not good for start - quite expensive, but has many feature built in
  • Hardware is inside case - ready to use outdoors
It is a pity that I don't have enough time to start playing with this stuff.

3 Dec 2009

Ubuntu screen profiles in SUSE Linux

What is this all about?

I like the look of screen application in Ubuntu, this feature is provided by screen-profiles package, which is not present in SUSE Linux (SLES 11).

Before start

Run zypper install newt newt-python to install dependencies.

Solution

contents of install-screen-profiles.sh
#!/bin/bash

PACKAGE_NAME='screen-profiles_1.44-0ubuntu1.2_all.deb'
WORKDIR='workdir-sp'
OPWD=`pwd`

# fetch package
wget http://us.archive.ubuntu.com/ubuntu/pool/main/s/screen-profiles/$PACKAGE_NAME

mkdir $WORKDIR

cd $WORKDIR

# unpack
ar x $OPWD/$PACKAGE_NAME

if [ ! -x "/usr/bin/screen.real" ] ; then
   sudo mv /usr/bin/screen /usr/bin/screen.real
else
   echo "Unable to write /usr/bin/screen.real - this can break your screen app"
fi

# unpack
tar -xvzf data.tar.gz

# Intall it
sudo find  usr -type f -exec install -D -m 755 {} "/{}" \;
sudo find  var -type f -exec install -D -m 755 {} "/{}" \;

echo "You may want to delete "$WORKDIR" (rm -rf $WORKDIR) "
Run bash install-screen-profiles.sh

Bugs

  • If you want to start screen after you login run bash /usr/share/screen-profiles/screen-launcher-install else bash /usr/share/screen-profiles/screen-launcher-uninstall.
  • F9 works a little bit strange ... - so use it at your own risk.