Post by Mike CopelandI would think a declared array of strings, stored and
accessed via the ZIP value won't work, as such an array would be too
large for the compiler (VS6.0).
VC6 is ancient and no longer supported. You should use the latest version,
VC9 SP1 (VS 2008 SP1).
Post by Mike CopelandGiven that, would an STL vector (or
something else) work well, both for storing and accessing?
std::vector is an excellent sequence container (it imitates builtin arrays,
but is far smarter). It can be used for key-value lookups when your keys
are unsigned integers, compact, and small. (If you have negative keys,
sparsely distributed keys, or compactly distributed keys that start at a
hundred million, you shouldn't use std::vector.) As it happens, zip codes
are unsigned integers, reasonably compact, and reasonably small, so you
could use std::vector. However, std::map is probably a better choice, as
it's an associative container. (std::map is an ordered associative
container, implemented with balanced binary trees; VC9 SP1 contains
std::tr1::unordered_map which is implemented with hashes, but by default you
should prefer std::map unless you have a reason to want hashes.) Using a
real associative container is a good idea because then you'll be able to
extend your program if in the future you need to support ZIP+4 or some other
key.
Here's an example of reading zip codes from a data file and loading them
into a std::map. std::map lookups are O(log N) which is "very fast". (For
example, my file has 29470 lines, so each lookup involves traversing about
15 nodes, which is essentially instantaneous on modern computers, unless for
some reason you need to do this a billion times per second.)
C:\Temp>type meow.cpp
#include <stdlib.h>
#include <exception>
#include <fstream>
#include <iostream>
#include <map>
#include <ostream>
#include <regex>
#include <sstream>
#include <stdexcept>
#include <string>
using namespace std;
using namespace std::tr1;
unsigned int zip_from_str(const string& s) {
istringstream iss(s);
unsigned int ret = 0;
iss >> ret;
return ret;
}
typedef map<unsigned int, string> zip_map_t;
typedef zip_map_t::const_iterator zip_map_ci_t;
zip_map_t read_zip_codes() {
zip_map_t ret;
// http://www.census.gov/geo/www/gazetteer/gazette.html links to:
// http://www.census.gov/tiger/tms/gazetteer/zips.txt
// http://www.census.gov/tiger/tms/gazetteer/zip90r.txt
ifstream f("zips.txt");
if (!f) {
throw runtime_error("RUNTIME ERROR: Couldn't open zips.txt.");
}
const regex
r("[^,]*,\"([0-9]{5})\",\"([A-Z]{2})\",\"([^\",]+)\",[^,]*,[^,]*,[^,]*,[^,]*");
for (string s; getline(f, s); ) {
smatch m;
if (!regex_match(s, m, r)) {
throw runtime_error("RUNTIME ERROR: Unrecognized line: " + s);
}
ret[zip_from_str(m[1])] = m[3].str() + ", " + m[2].str();
}
return ret;
}
int main() {
try {
const zip_map_t m = read_zip_codes();
cout << "Enter ZIP codes:" << endl << endl;
const regex r("[0-9]{5}");
for (string s; getline(cin, s); ) {
if (s == "bye") {
break;
}
if (!regex_match(s, r)) {
cout << "That's not a ZIP code." << endl;
continue;
}
const zip_map_ci_t i = m.find(zip_from_str(s));
if (i == m.end()) {
cout << "ZIP code not found." << endl;
continue;
}
cout << "Location: " << i->second << endl << endl;
}
} catch (const exception& e) {
cout << "Exception: " << e.what() << endl;
return EXIT_FAILURE;
} catch (...) {
cout << "Unknown exception." << endl;
return EXIT_FAILURE;
}
}
C:\Temp>cl /EHsc /nologo /W4 /MT /O2 /GL meow.cpp
meow.cpp
Generating code
Finished generating code
C:\Temp>meow
Enter ZIP codes:
98052
Location: REDMOND, WA
90210
Location: BEVERLY HILLS, CA
02115
Location: BOSTON, MA
10001
Location: NEW YORK, NY
20008
Location: WASHINGTON, DC
30303
Location: ATLANTA, GA
40202
Location: LOUISVILLE, KY
50309
Location: DES MOINES, IA
60601
Location: CHICAGO, IL
77063
Location: HOUSTON, TX
80202
Location: DENVER, CO
99950
Location: KETCHIKAN, AK
bye
C:\Temp>
If for some reason you couldn't use a separate data file and had to embed
this information directly in a C++ source file, you could process your file
(using sed or some other regular expression engine) into a series of
statements of the form:
m[ 2115] = "BOSTON, MA";
m[10001] = "NEW YORK, NY";
m[20008] = "WASHINGTON, DC";
(Warning: you would need to remove leading zeros in order to avoid the wrath
of octal numbers.)
Or, if you really wanted to use std::vector<std::string> (which offers O(1)
lookups, but doesn't like sparsely distributed keys), you'd want:
std::vector<std::string> v(100000);
v[ 2115] = "BOSTON, MA";
v[10001] = "NEW YORK, NY";
v[20008] = "WASHINGTON, DC";
(When embedding the data in a C++ source file, you could use const char *
instead of std::string, which would avoid dynamically allocating a copy of
string literals that are in your executable to begin with.)
Also, you could use std::vector<std::pair<unsigned int, std::string> >
sorted by the unsigned int. This offers O(log N) lookups like std::map, but
occupies less memory, and is sometimes a good choice.
Stephan T. Lavavej
Visual C++ Libraries Developer
Post by Mike CopelandI have a large data file that contains US ZIP codes and City & State
values. It has > 64K records (I can't load it into Excel), and I wish
to use it in a C/C++ application - I want to look up the City & State
for specific ZIP code values. I don't need any other information from
this file.
My question is: what's the best technique or language feature for
storing and accessing individual Cities & States via the ZIP code value
(10000-99999)? I would think a declared array of strings, stored and
accessed via the ZIP value won't work, as such an array would be too
large for the compiler (VS6.0). Given that, would an STL vector (or
something else) work well, both for storing and accessing? Please
advise. TIA