本文共 2853 字,大约阅读时间需要 9 分钟。
关于爬虫和正则表达式的知识请各位自己百度或博客。
主要的代码如下,主要是源代码,有些//注释可自行忽略。(大神出门右拐,不谢~)
public class MailCheck { public static StringBuffer getWebMail(String addressUrl) throws Exception{// URL url = new URL("http://192.168.56.1:8080/myweb/mail.html");// URL url = new URL("http://www.douban.com/group/topic/24022171/"); URL url =new URL(addressUrl); URLConnection conn = url.openConnection(); InputStream in = conn.getInputStream(); BufferedReader bufin = new BufferedReader(new InputStreamReader(in)); String mailreg = "\\w+@\\w+(\\.\\w+)+";//相对不太精确// String reg = "[a-zA-Z0-9_]+@[a-zA-Z0-9]+(\\.[a-zA-Z]+)+";//较为精确 String line = null; Pattern p = Pattern.compile(mailreg);// File file = new File("F:\\java_p\\MailFromWeb.txt");// FileOutputStream out = new FileOutputStream(file); StringBuffer sbuf = new StringBuffer(); int count = 0; while((line = bufin.readLine())!=null){ Matcher m = p.matcher(line); while(m.find()){// System.out.println(m.group()); ++count; sbuf.append(m.group()); sbuf.append("\r\n");// byte[] b = m.group().getBytes();// for (int i = 0; i < b.length; i++) { // out.write(b[i]);// }// out.write("\r\n".getBytes()); //换行 } }// out.close(); sbuf.append("总共找到"+count+"个邮箱"); return sbuf; } //还未修改 public static StringBuffer getLocalMail(String addressLocal) throws Exception{// BufferedReader buff = new BufferedReader(new FileReader("F:\\java_p\\webmail.txt")); BufferedReader buff = new BufferedReader(new FileReader(addressLocal)); String mailreg = "\\w+@\\w+(\\.\\w+)+";//相对不太精确的匹配。 String line = null; Pattern p = Pattern.compile(mailreg); StringBuffer sbuf = new StringBuffer();// File file = new File("F:\\java_p\\mail2.txt");// FileOutputStream out = new FileOutputStream(file); int count = 0; while((line = buff.readLine())!=null){ Matcher m = p.matcher(line); while(m.find()){// System.out.println(m.group()+"----"+(++count)); ++count; sbuf.append(m.group()); sbuf.append("\r\n");// byte[] b = m.group().getBytes();// for (int i = 0; i < b.length; i++) { // out.write(b[i]);// }// out.write("\r\n".getBytes()); //换行 } }// System.out.println("总共找到"+count+"个邮箱");// out.close(); sbuf.append("总共找到"+count+"个邮箱"); return sbuf; }}
简单的界面(初学者,大神勿喷):
网络地址爬虫:本地地址爬虫:
邮箱地址爬虫工具,我已经上传到CSDN,各位有需要的可下载,链接如下:
为了满足初学者(大神请忽略)的好奇心,我还是把源代码上传吧: