Searching the web for concordances in real time.
A tools for building ad hoc corpora from the web based on search engine queries. KWiCFinder conducts your online searches without supervision. It returns Key Word in Context abstracts highlighting your search terms so you can evaluate the usefulness of documents matching your query.
Simple Utilities to Bootstrap Corpora And Terms from the Web. The perl scripts included in the BootCaT toolkit implement an iterative procedure to bootstrap specialized corpora and terms from the web, requiring only a list of “seeds” (terms that are expected to be typical of the domain of interest) as input.
HTTrack is a free (GPL, libre/free software) and easy-to-use offline browser utility. It allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. HTTrack arranges the original site's relative link-structure. Simply open a page of the “mirrored” website in your browser, and you can browse the site from link to link, as if you were viewing it online. HTTrack can also update an existing mirrored site, and resume interrupted downloads. HTTrack is fully configurable, and has an integrated help system.
When editing HTML it's easy to make mistakes. Wouldn't it be nice if there was a simple way to fix these mistakes automatically and tidy up sloppy editing into nicely layed out markup? Well now there is! Dave Raggett's HTML TIDY is a free utility for doing just that. It also works great on the atrociously hard to read markup generated by specialized HTML editors and conversion tools, and can help you identify where you need to pay further attention on making your pages more accessible to people with disabilities.
Web2Text is a HTML to ASCII text converter. Unlike most others, however, this one not only has an easy to use graphical interface but it actually produces a nicely laid out text version, and keeps URLs visible. A minimum of post-conversion editing required!