Quantcast
Channel: Statalist
Viewing all articles
Browse latest Browse all 72793

handling 10.000 files named with 11 digit numbers

$
0
0
Hello. I have over 10.000 txt zipped files which I would like to unzip, convert to stata and append all in only one dta dataset. The problem is that the zipfiles's names are 11 digit number ones, with a big interval between them. So using forvalues combined with capture confirm file takes forever. Stata has been already running for 20 hours and it hasn't even unzipped half of all files. I hope there is a more efficient way of doing so.

Actually is it possible to work with zipped files in Stata without having to unzip them?

Below is the code which is taking forever:

Code:
forvalues i=11000000000/54000000000 {
capture confirm file "`i'.zip"
if _rc==0 {
unzipfile `i'.zip, replace    
}
}

clear
tempfile temp
save `temp', emptyok

forvalues i=11000000000/54000000000 {

capture confirm file "`i'.dta"
if _rc==0 {
use `i'.dta, clear

infix uf 1-2 mun 3-7 dist 8-9 subdist 10-11 set 12-15 st_set 16-16 str lati 322-336 str longe 337-351 ///
tp 472-473 str subtp 474-513 quadra 545-547 face 548-550 cep 551-558 using 11000150500.txt

append using `temp'
display "`i'"
save `"`temp'"', replace
}
}

save all.dta, replace

Viewing all articles
Browse latest Browse all 72793

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>