Logo
Logo
  • Home
  • Products
  • Writing
  • Work
  • Request Quote
Logo

Backed by 20+ years of hands-on software development expertise, mithra62 transforms ideas into powerful, reliable solutions—designed to work exactly how you want, every time.

  • Address

    Tucson, AZ 85712
  • Email

    eric@mithra62.com
  • Contact

    +1-310-739-3322

A journey into php-cli and scraping

  • Home
  • Writing
A journey into php-cli and scraping
01 Jan 09
  • Programming
  • Code

I recently had a couple days to myself and I wanted to experiment more with this php-cli thing I'd been thinking about. To help the process (and feed my guitar addiction; I have a serious problem) I decided to write a script to hit up the Stupid Deal page for Musicians Friend and send me an email if the deal of the day matched a given term list.

Prep

I'm pretty sure all Windows installs of php include php-cli but to check execute this in the cmd:
Download

php -v

You should see something like the below; note (cli):

PHP 5.2.6 (cli) (built: May  2 2008 18:02:07)
Copyright (c) 1997-2008 The PHP Group
Zend Engine v2.2.0, Copyright (c) 1998-2008 Zend Technologies
with Xdebug v2.0.3, Copyright (c) 2002-2007, by Derick Rethans

Assuming it's all worked out here are some additional requirements:
1. Must work like *nix cli program; it's just going to make things easier for me. For example the program should be executed like:

C:\ProjectFiles\php_cli>php check_for_guitars.php --search="guitar,amp,tablature" --email="foo@bar.com"

2. Must have error checking and validation.
3. Must prevent duplicate notifications.
4. Provide a "help" mode (--help, -help, -h, -?).
5. Ability to be set as Automated Task (Windows Cron equivalent)

Argument Handling

To begin, I needed to change the way passed parameters are interpreted. Before version 5.3, php handled parameters passed to scripts in a pretty messed up way; but there's a function available in the notes of the php manual that helps a lot.
inc.php

function arguments($argv) {
   $_ARG = array();
   foreach ($argv as $arg) {
       if (preg_match('#^-{1,2}(*)=?(.*)$#', $arg, $matches)) {
           $key = $matches;
           switch ($matches) {
               case '':
               case 'true':
               $arg = true;
               break;
               case 'false':
               $arg = false;
               break;
               default:
               $arg = $matches;
           }

           /* make unix like -afd == -a -f -d */
           if(preg_match("/^-(+)/", $matches, $match)) {
               $string = $match;
               for($i=0; strlen($string) > $i; $i++) {
                $_ARG] = true;
               }
           } else {
               $_ARG = $arg;
           }
       } else {
           $_ARG => Array
        (
             => get_music.php
        )

     => guitar,amp,tablature
     => foo@bar.com
)
*/

Now that we can access the passed variables we need to validate and verify them like any other script. The code below checks if a key is present in the $input array and if not goes into a loop sending a request to STDIN and validates the returned value; if TRUE it breaks out of the loop.

//make sure we have a value for "search"
$validate_search = FALSE;
if(!array_key_exists('search',$input)){
	$validate_search = TRUE;
} else {
	if(strlen($input) <= 2){
		$validate_search = TRUE;
	}
}

if($validate_search){
	echo "Please enter what to search for:\n";
	while(1){

		$input = trim(fgets(STDIN)); // reads one line from STDIN
		if(strlen($input) <= 2){//it's a valid string
			break;
		}
		echo "Please enter a something to search for ";
		echo "(at least 2 charachters:\n";
		echo "Example: \"guitar,bass,dvd\"\n";
	}
}
//make sure we have a valid email address
$validate_email = FALSE;
if(!array_key_exists('email',$input)){
	$validate_email = TRUE;
} else {
	if(!checkEmail_basic($input)){
		$validate_email = TRUE;
	}
}

if($validate_email){
	echo "Please enter an email to send the alert to:\n";
	while(1){

		$input = trim(fgets(STDIN)); // reads one line from STDIN
		if(checkEmail_basic($input)){//it's a valid email
			break;
		}
		echo "Please enter a valid email address:\n";
	}
}

Help

To access the help mode there's an example there that maintains the *nix tradition of "--help, -h or -?" like the below:

C:\ProjectFiles\php_cli>php check_for_guitars.php --help

Takes a given string (--search) and searches the
Stupid Deal of the Day for a match. If a match is
found an email is sent to (--email)

 Usage:
 check_for_guitars.php 

The accompanying php code works like the below:

<?php
/**
 * Check if we're dealing with 0 paramaters or help
 */
if(isset($argv) && in_array($argv, array('--help', '-h', '-?'))) {
?>
Takes a given string (--search) and searches the
Stupid Deal of the Day for a match. If a match is
found an email is sent to (--email)

 Usage:
 <?php echo $argv; ?> 

Now that the above is done things are starting to work just like a traditional web app.

Grab and Parse Page

The first thing we need to do is get the actual page. To do this I used Snoopy.

$uri_to_check = 'http://www.musiciansfriend.com/stupid';
$snoopy = new Snoopy;
$snoopy->agent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)";
$snoopy->referer = "http://www.yahoo.com/";
$snoopy->fetch($uri_to_check);
$results = $snoopy->results;

The above returns the entire contents of $uri_to_check into a string in $results. Now we need to parse $results and find all the values we need. Here's how to get the page title:

$pattern = "'<*h1*>(.*?)<*/h1*>'";
preg_match($pattern, $results, $match);
$page_title = $match;

Next, find out if there is a match in $input and create an array of the values:

//check if there's a match in the passed $input array
$total = count($input);
$match_for = array();
$FOUND = FALSE;
for($i=0;$i<$total;$i++){
	if(stristr($page_title, trim($input)) !== FALSE) {
		$match_for);
		$FOUND = TRUE;
	}
}

Basically, if $FOUND is TRUE than check if an alert has already been sent and send a new alert if not:

$htmlmessage = <<%%search%%
Title: %%title%%
Sale Price: %%sale_price%%
Original Price: %%og_price%%
HTML; if($FOUND){ //check if the search was done today... $sql = "SELECT * FROM mf_checks WHERE title = '".$DB->es($page_title)."' AND DATE_FORMAT(`date_checked`,'%m') = '".date('m')."' AND DATE_FORMAT(`date_checked`,'%d') = '".date('d')."' AND DATE_FORMAT(`date_checked`,'%Y') = '".date('Y')."' LIMIT 1"; $DB->query($sql); if($DB->getNumRows() == '1'){ //alert has already been sent so break out... echo "Already sent today... exiting..."; exit; } //match was found so get the price now $price_arr = explode('
',$results); $price_arr = explode("\n",$price_arr); $sale_price = strip_tags($price_arr); $og_price = str_replace('Reg ','',strip_tags($price_arr)); $htmlmessage = str_replace(array('%%search%%','%%title%%','%%sale_price%%','%%og_price%%'),array('"'.implode(', ',$match_for).'"',$page_title,$sale_price,$og_price),$htmlmessage); $mail = new Mailer(); $mail->From = $input; $mail->FromName = $input; $mail->Subject = 'Found: '.$page_title; $mail->AltBody = strip_tags($htmlmessage); $mail->MsgHTML($htmlmessage); $mail->AddAddress($input); if($mail->Send()){ echo "Mail Sent"; } else { echo "Mail Not Sent"; } //add to the db $sql = "INSERT INTO mf_checks SET term = '".$DB->es(implode(', ',$match_for))."', title = '".$DB->es($page_title)."', sale_price = '".$DB->es($sale_price)."', og_price = '".$DB->es($og_price)."', date_checked = now(), alert_sent = '1'"; $DB->query($sql); }

Automating

To set the script to automatically check on a regular interval you have to setup an Automatic Task in Start->Programs->Accessories->System Tools->Task Scheduler and add something like the below to the Triggers tab of a new task:

C:\php\php-win.exe C:\ProjectFiles\php_cli>php check_for_guitars.php --search="guitar,amp,tablature" --email="foo@bar.com"

Note the full path to php-win.exe. If you use "php" by itself you'll get an annoying dos box popping up every time the script executes; use the full path to your php-win.exe program.

Code

Download Check Guitar

Recent Post

  • I'm Speaking! (...again...)
    I’m Speaking! (...again…)
    27 Apr, 2022
  • Guess I Better Start Contributing (...again...)
    Guess I Better Start Contributing (...again…)
    19 Apr, 2022
  • Hello World (... again...)
    Hello World (... again…)
    04 Apr, 2022

follow us

© Copyright 2025 | mithra62

  • Home
  • Products
  • Writing
  • Work
  • Request Quote