4 Jun 2012

Howto build deb package with Perl module (Lingua::Identify)

What is Lingua::Identify?

Lingua::Identify is Perl module used for text language identification - remember that language identification is not 100% accurate.

Why use Lingua::Identify?

Here is list of most important (for me) features of this module
  • it's free and it's open-source;
  • it has unicode support in standard;
  • it's a module, which means you can easily write your own application
  • it supports big inputs
  • it's maintained

How to create deb package of Lingua::Identify?

# Install packages required to build *.deb
apt-get install dh-make-perl devscripts apt-file
apt-file update

# Install Lingua::Identify dependences from packages
apt-get install libclass-factory-util-perl libtext-affixes-perl

# Lingua::identify depends on Text::Ngram (but this is not packaged)
# http://search.cpan.org/~ambs/Text-Ngram-0.14/lib/Text/Ngram.pm
wget http://search.cpan.org/CPAN/authors/id/A/AM/AMBS/Text/Text-Ngram-0.14.tar.gz

# http://search.cpan.org/~ambs/Lingua-Identify-0.51/lib/Lingua/Identify.pm
wget http://search.cpan.org/CPAN/authors/id/A/AM/AMBS/Lingua/Lingua-Identify-0.51.tar.gz

# Make subdirectory
mkdir test
mv *.tar.gz test/
cd test

# Build Text::Ngram
tar -pzxvf Text-Ngram-0.14.tar.gz 
mv Text-Ngram-0.14.tar.gz libtext-ngram-perl_0.14.orig.tar.gz
dh-make-perl Text-Ngram-0.14
cd Text-Ngram-0.14
debuild -us -uc

# Test package (install it)
cd ..
sudo dpkg -i libtext-ngram-perl_0.14-1_i386.deb
perl -e 'use Text::Ngram qw(ngram_counts add_to_counts);'

# Build Lingua::Identify (depends on Text::Ngram)
tar -pzxvf Lingua-Identify-0.51.tar.gz
mv Lingua-Identify-0.51.tar.gz liblingua-identify-perl_0.51.orig.tar.gz
dh-make-perl Lingua-Identify-0.51
cd Lingua-Identify-0.51/
debuild

# Install our new package
cd ..
sudo dpkg -i liblingua-identify-perl_0.51-1_all.deb 

Finish!

Your first Perl Ubuntu/Debian package is ready and installed, it is not perfect but it works! If you want to push your deb package to repository, you should learn more, more and some more.

Supplement material

If you are interested in writing your own lingua identifier using ngram method here are some links that could be helpful:

No comments:

Post a Comment