Parallel C4.5 (PC4.5)
Build classification trees in parallel.
Why PC4.5?
If you have C4.5 and a network of workstations that are accessible to you,
PC4.5 will help you better use C4.5. PC4.5 offers you these advantages:
- It is faster. In an N trial c4.5 run, a single process builds N
classification trees one by one and then picks the best one. In
PC4.5,
the N trials are each handled by a process and each process is
run on a different machine (if N or more machines are available).
- It is fault-tolerant. PC4.5 automatically assigns a process to
a machine if the machine is idle (i.e. no activity by the machine's
owner). If the owner of a machine comes back or it crashes during
a PC4.5 computation, the PC4.5 process automatically retreats and
resumes on a different machine that is idle.
- It supports multiple platforms. PC4.5 runs on SunOS, Solaris and
Linux machines (for HPUX, IRIX, and ALPHA, please contact author).
Networked multi-platform workstations can run PC4.5 processes of the
a single PC4.5 program at the same time.
How Does It Do It?
PC4.5 is built with the
Persistent Linda
(PLinda) system, a software system for robust distributed parallel computing
developed at New York University.
To get more information on PLinda, please visit our web site
or send email to plinda@cs.nyu.edu.
Future Work
- Visulization. Convert a decision tree generated by PC4.5 into a
ThinkSheet,
so it is easier (and more fun) to consult it.
- Suggestions -- please feel free to send mail to binli@cs.nyu.edu.
People
WANT TO WRITE TO US?
Last modified by
Bin Li.
January 22, 1997.