User:Niyogi
Appearance
magazine
[edit]- downloaded 615 (385 .com) raw content (bz2 format)
Next steps:
- build feature lists using new wikipedia lexicon
category
[edit]- have amazon and shopping for lexicon.txt
Next steps:
- need ebay; figure out soap/php interface to ebay and get
- rebuild cat maps
dmoz
[edit]- have 120K/174K front pages; 1link.csv has "key features" now
Next steps:
- build corpus of key features for each category in 1link.csv
ontok/ExtractAttributesfromText
[edit]- prototyped code, seen it work for "thinkpad laptops"
Next steps:
- test out search_by_product/brand on "600x ipod nano" etc.
- write search_by_model code
ontok/ExtractLocations
[edit]- use new city/state features to detect city/state combos quickly on "contact us" pages
ontok/wikipedia/products
[edit]- have wikipedia and product lexicon merged
Next steps:
foreach ($titlearr as $title) { expand the associations on productbrand: any product-brand combo appearing brandmodel: anything that looks like a model (alphanumeric or 00 or short) productfeature: any product-feature combo appearing productunit: any product-unit mapping } foreach ($brandarr as $brand) { // determine product associations } foreach ($brandmodel as $brand => $modelarr) { foreach ($modelarr as $model => $n) { // determine product associations } } how to determine product associations read in the productbrand table read yhoo search response, google suggest reponse detect "ma" features from output for brand links, check the productbrand table for brand-model links, check the productbrand table