Categories
- Android (5)
- Cloud Computing (10)
- Development (14)
- Linux (35)
- Uncategorized (36)
- WordPress (5)
Tags
Archives
-
Random Posts
Delicious Bookmarks
- Labor Efficiency: The Next Great Internet Disruption 22 hours ago
- Apache HBase 0.92.0 has been released 2012/02/04
- Ceph 2012/02/04
- Maximize virtio-net performance with vhost-net 2012/02/03
- Android-x86 - Porting Android to x86 2012/02/01
- 中国电信将成立云计算公司 践行“天翼云计算”战略 2012/02/01
Google Reader Shares
- Hudson vs. Jenkins: Is it too soon to declare a winner?
- Galaxy Nexus 的 Super AMOLED 屏幕多了 HD,但少了 Plus
- Faenza Icon theme for Gnome 3.2 makes Ubuntu a little more eye candy
- Google 在台湾、香港及新加坡兴建自己的数据中心,一到两年后启用
- 既定的秩序是教育的结果
- Amazon S3 - 566 Billion Objects, 370,000 Requests/Second, and Hiring!
Links
Meta
Monthly Archives: February 2007
Using Nutch 0.8.1 for Intranet Crawling and Searching
本文尝试使用 Nutch 0.8.1 来为几个指定的网站建立全文索引,且不使用 Hadoop 提供的分布式能力,只是简单地在一台单独的机器上完成索引工作。如果需要使用 Nutch 的分布式能力,需要熟悉一下 Hadoop. 约定 Nutch 部署后的目录结构如下: /home/hys/nutch-deployed /nutch-0.8.1 (Nutch 0.8.1 installation goes here) /nutch-0.8.1-web (Nutch web module for searching goes here) 1. Setting environment variable(s) $ export NUTCH_JAVA_HOME=/usr/lib/jvm/java-1.5.0-sun-1.5.0.08 $ export JAVA_HOME=/usr/lib/jvm/java-1.5.0-sun-1.5.0.08 $ export NUTCH_HOME=/home/hys/nutch-deployed/nutch-0.8.1 … Continue reading