Saturday, December 17, 2011

Program to Search for a Substring inside a String

Problem: If the designers of the String class had not defined the version of indexOf that takes a string argument, you could implement it using the other methods available in the library. Without calling indexOf directly, implement a method myIndexOf that behaves in exactly the same way.


What it looks like:




/* File: myIndexOfTester.java
 * ---------------
 * This program has a run method that lets the user enter in a (supposed to be) big string and a
 * little string. Then it uses the private method myIndexOf to see whether the little string exists
 * inside the big string and if so, return the index within the big string that the little string
 * exists at.
 * 
 * There is already an indexOf method in Java's standard String class, but the Stanford folks 
 * thought it would be good for us to try implementing it ourselves with other methods in the
 * String class.
 */
import acm.program.*;
public class myIndexOfTester extends ConsoleProgram {
    public void run() {
       String enteredString = readLine("Enter a string: ");
       String searchedForString = readLine("Enter the string you're searching for inside it: ");
       print(myIndexOf(searchedForString, enteredString));
    }
    private int myIndexOf(String littleString, String bigString) {
//"Real" program would check that littleString is longer than bigString & raise and exception of it weren't

//For every letter in bigString MINUS the length of littleString (to guard against out-of-bounds
//errors), check if littleString equals the segment of bigString that starts at that index.
        for (int i = 0; i < bigString.length()-1-littleString.length()-1; i++) {      
            if (littleString.equals(bigString.substring(i, i+littleString.length() ))) {
                return i;
            }           
        }
        return -1; //Signifies that littleString doesn't exist inside bigString         
    }
}




The basic gist of this problem, or at least my approach to it, is you want to take your little string and compare it to substrings inside the big string. In particular, you want to iterate through each letter of the big string and carve out a substring starting at that index and of an equal length to the little string. If they are equivalent, then return the index. If you never get an equivalency, then return -1 (as is the existing convention).


What made it tricky: Two things! 1) Off-by-one errors, and 2) Out-of-bounds errors.


** Off-by-one Errors **


Let's say your big string is "HELLO WORLD". This string has 11 characters, including the space, so the method bigString.length() would return 11. If you iterated from 0 to 11, however, your last iteration would be looking at nothing since there's no character at space "11"! Therefore, you have to make the outer bound of your loop is the length of the big string minus one.




** Out-of-bounds Errors **


Let's say your big string is "HELLO", and the little string you're inspecting "HELLO" for is "HI." You go thru every letter in "HELLO," seeing if the 2-letter substring at that letter is equivalent to "HI".


However, what happens on the last couple of iterations of your loop? If your index is the letter "O" in "HELLO", you're comparing "HI" with "O" and then nothing! To keep this from happening, you need to stop your comparisons so that all the letters in "HI" have something in the string "HELLO" to get compared to.






The last tricky thing is, the problem specified that the method has to behave the *same* way as the existing indexOf. That takes in just one argument (the little string) and you apply it to the big string. My implementation takes in both the big string and the little string as arguments, so I had been worried that I hadn't actually solved the requirement. 


However, how is it possible to write your own implementation as a method by itself without "knowing" the length of the big string? Not sure whether this is do-able.... I checked the textbook to see whether the authors printed this problem. (I have to confess that I've been studying with a PDF of the pre-print copy of the textbook since it's easier to have on hand, and I leave the heavy text at home). I saw that the printed version omitted this problem, so I'm assuming that the editors decided that it wasn't quite a perfectly designed problem because of that mismatch. Thoughts?

No comments:

Post a Comment