Frequently Asked Questions #常见问题
Does pyspider Work with Windows? #pyspider是否与Windows兼容?
Yes, it should, some users have made it work on Windows. But as I don't have windows development environment, I cannot test. Only some tips for users who want to use pyspider on Windows:
#有些包需要二进制libs(例如pycurl,lxml),可能你无法通过pip安装它,Windowns二进制包可以在[ ~ gohlke / pythonlibs /]中找到。
- Some package needs binary libs (e.g. pycurl, lxml), that maybe you cannot install it from pip, Windowns binaries packages could be found in [](
#准备一个干净的环境与 [virtualenv](
- Make a clean environment with [virtualenv](
- Try 32bit version of Python, especially your are facing crash issue.
#不要使用Python 3.4.1版本
- Avoid using Python 3.4.1 ([#194](, [#217](
Unreadable Code (乱码) Returned from Phantomjs #Phantomjs返回的结果乱码
Phantomjs doesn't support gzip, don't set `Accept-Encoding` header with `gzip`.How to Delete a Project? #怎么样删除一个项目?
set `group` to `delete` and `status` to `STOP` then wait 24 hours. You can change the time before a project deleted via `scheduler.DELETE_TIME`.
How to Restart a Project?#怎么样重启一个项目?
#### Why 为什么重启?
It happens after you modified a script, and wants to crawl everything again with new strategy. But as the [age](/apis/self.crawl/#age) of urls are not expired. Scheduler will discard all of the new requests.
#### Solution 解决方案
1. Create a new project. #创建一个新的项目
2. Using a [itag](/apis/self.crawl/#itag) within `Handler.crawl_config` to specify the version of your script.#在`Handler.crawl_config`中使用一个标签指定脚本的版本。
How to Use WebDAV Mode? #怎么样使用WebDAV模式?
Mount `http://hostname/dav/` to your filesystem, edit or create scripts with your favourite editor.
> OSX: `mount_webdav http://hostname/dav/ /Volumes/dav`
> Linux: Install davfs2, `mount.davfs http://hostname/dav/ /mnt/dav`
> VIM: `vim http://hostname/dav/`
When you are editing script without WebUI, you need to change it to `WebDAV Mode` while debugging. After you saved script in editor, WebUI can load and use latest script to debug your code.
What does the progress bar mean on the dashboard? #仪表板上进度条是什么意思?
When mouse move onto the progress bar, you can see the explaintions.
For 5m, 1h, 1d the number are the events triggered in 5m, 1h, 1d. For all progress bar, they are the number of total tasks in correspond status.
Only the tasks in DEBUG/RUNNING status will show the progress.
How many scheduler/fetcher/processor/result_worker do I need? or pyspider stop working
#您只能有一个调度器,有多少个fetcher /processor/ result_worker依赖于系统瓶颈。您可以使用仪表板上的队列状态来查看系统的瓶颈
You can have only have one scheduler, and multiple fetcher/processor/result_worker depends on the bottleneck. You can use the queue status on dashboard to view the bottleneck of the system:
![run one step](imgs/queue_status.png)
For example, the number between scheduler and fetcher indicate the queue size of scheduler to fetchers, when it's hitting 100 (default maximum queue size), fetcher might crashed, or you should considered adding more fetchers.
The number `0+0` below fetcher indicate the queue size of new tasks and status packs between processors and schduler. You can put your mouse over the numbers to see the tips.