Categories
- Android (3)
- Cloud Computing (4)
- Development (13)
- Linux (31)
- Uncategorized (31)
- WordPress (5)
Tags
Archives
-
Random Posts
Delicious Bookmarks
- Sending/receiving a Large size of message 2010/08/30
- Linux: How To Clear The Cache From Memory 2010/08/26
- Ejabberd-Cluster Database Configuration 2010/08/21
- How to: Find the inode size of an ext2 / ext3 filesystem 2010/08/21
- Ubuntu Router 2010/08/21
- HOWTO: Wireless Security - WPA1, WPA2, LEAP, etc. 2010/08/21
Google Reader Shares
Richard's GR ShareLinks
Meta
Monthly Archives: February 2007
Using Nutch 0.8.1 for Intranet Crawling and Searching
本文尝试使用 Nutch 0.8.1 来为几个指定的网站建立全文索引,且不使用 Hadoop 提供的分布式能力,只是简单地在一台单独的机器上完成索引工作。如果需要使用 Nutch 的分布式能力,需要熟悉一下 Hadoop. 约定 Nutch 部署后的目录结构如下: /home/hys/nutch-deployed /nutch-0.8.1 (Nutch 0.8.1 installation goes here) /nutch-0.8.1-web (Nutch web module for searching goes here) 1. Setting environment variable(s) $ export NUTCH_JAVA_HOME=/usr/lib/jvm/java-1.5.0-sun-1.5.0.08 $ export JAVA_HOME=/usr/lib/jvm/java-1.5.0-sun-1.5.0.08 $ export NUTCH_HOME=/home/hys/nutch-deployed/nutch-0.8.1 … Continue reading