What resources do you recommend? What advice would you give to help save time and optimize learning? What projects lend themselves best to Stata?
↧
Just installed Stata, where do I start?
↧
Rename a List of Variables by Extracting From Old Names
Dear All,
I'm trying to rename a list of variables, the original names are like:
And I'm trying to rename the variables to only the country names (extract the country name from the original names):
I tried to use the "substr" function, but I wasn't able to find a method to detect the index of the first "_" in the original names.
Will appreciate if I could help on this!
Many thanks,
Craig
I'm trying to rename a list of variables, the original names are like:
Code:
Australia_a_b, Brazil_c_d, Congo_e_f, Chile_s_j
Code:
Australia, Brazil, Congo, Chile
Will appreciate if I could help on this!
Many thanks,
Craig
↧
↧
How to use putdocx table?
Hi I'm having trouble with a putdocx command
I would like to put on word a table like that :
macros:
r(name3) : "GS >=3+4"
r(name2) : "GS<=3+3"
r(name1) : "no biopsy"
matrices:
r(Stat3) : 8 x 6
r(Stat2) : 8 x 6
r(Stat1) : 8 x 6
r(StatTotal) : 8 x 6
Using
. tabstat psa trusprosvol fk_score ex_score likelihood_hg, by(GGtot_neg_gg1) statistics (count mean sd min p25 p50 p75 max) columns(statistics) save
I'm using the following codes:
"putdocx clear
putdocx begin
tabstat psa trusprosvol fk_score ex_score likelihood_hg, by(GGtot_neg_gg1) statistics (count mean sd min p25 p50 p75 max) columns(statistics) save
return list
matrix LabResults1 = r(StatTotal)'
matrix LabResults2 = r(stat3)
matrix LabResults3 = r(stat2)
matrix LabResults4 = r(stat1)
putdocx table Table1 = matrix(LabResults1) , rownames colnames
putdocx table Table2 = matrix (LabResults2)
putdocx table Table3 = matrix (LabResults3)
putdocx table Table4 = matrix (LabResults4)
putdocx save "table_stat1.docx", replace"
But I do not have only the fist table and then 3 empty cells.
Can you help me , please
I would like to put on word a table like that :
macros:
r(name3) : "GS >=3+4"
r(name2) : "GS<=3+3"
r(name1) : "no biopsy"
matrices:
r(Stat3) : 8 x 6
r(Stat2) : 8 x 6
r(Stat1) : 8 x 6
r(StatTotal) : 8 x 6
Using
. tabstat psa trusprosvol fk_score ex_score likelihood_hg, by(GGtot_neg_gg1) statistics (count mean sd min p25 p50 p75 max) columns(statistics) save
I'm using the following codes:
"putdocx clear
putdocx begin
tabstat psa trusprosvol fk_score ex_score likelihood_hg, by(GGtot_neg_gg1) statistics (count mean sd min p25 p50 p75 max) columns(statistics) save
return list
matrix LabResults1 = r(StatTotal)'
matrix LabResults2 = r(stat3)
matrix LabResults3 = r(stat2)
matrix LabResults4 = r(stat1)
putdocx table Table1 = matrix(LabResults1) , rownames colnames
putdocx table Table2 = matrix (LabResults2)
putdocx table Table3 = matrix (LabResults3)
putdocx table Table4 = matrix (LabResults4)
putdocx save "table_stat1.docx", replace"
But I do not have only the fist table and then 3 empty cells.
Can you help me , please
↧
Generating group ids from more groups
Hello all,
I am cleaning a data base (my own field work in Colombia, yey!). I have information on the state, the county and on the village each for each household.
Currently I have categorized the states (there are 4) so an id of 1-4. I have categorized the counties (there are 12) so 1-12. Now I want to categorize the villages (around 300 of them) that are currently string. The problem is some villages have the same names. Some are in different states (which is not the issue), but some are in the same state but different counties. The data looks a little like this:
I want to turn the villages into number (or an identifier), but as you can see Las Palmas of state 1 and county 2 is not the same las palmas from state 1 and county 4. Additionally, The Cristales from state 2 is not the same as the Cristales in state 3.
Is there an easy way to do this by bysort?
Thank you!
I am cleaning a data base (my own field work in Colombia, yey!). I have information on the state, the county and on the village each for each household.
Currently I have categorized the states (there are 4) so an id of 1-4. I have categorized the counties (there are 12) so 1-12. Now I want to categorize the villages (around 300 of them) that are currently string. The problem is some villages have the same names. Some are in different states (which is not the issue), but some are in the same state but different counties. The data looks a little like this:
State | County | Village |
1 | 2 | Las Palmas |
1 | 2 | San Lorenzo |
1 | 4 | Las Palmas |
2 | 1 | Cristales |
2 | 1 | Sardeña |
3 | 3 | Cristales |
3 | 6 | Puerto Legio |
4 | 6 | Maria Helena |
4 | 12 | Fonda |
I want to turn the villages into number (or an identifier), but as you can see Las Palmas of state 1 and county 2 is not the same las palmas from state 1 and county 4. Additionally, The Cristales from state 2 is not the same as the Cristales in state 3.
Is there an easy way to do this by bysort?
Thank you!
↧
cleaning data
I have a very large data set with over 100,000 observations. one of the variables is Race and another is ChildID and Month. I want to make sure that in each month the Race for the ChildID is the same (does not change from month to month). Do you have a suggestion for how i can do a data check on the Race variable?
Here is a small sample of the dataset to give you a better idea of what i want to do. Here you can see that the Race variable for child with the ID AA000S3H changes from month to month. Race should be "B" in each month.
ChildID BeneMonth Race
AA000S3H 201401 B
AA000S3H 201407 B
AA000S3H 201406 B
AA000S3H 201408 W
AA000S3H 201403 B
AA000S3H 201405 B
AA000S3H 201312 B
AA000S3H 201402 H
AA000S3H 201409 B
AA000S3H 201310 B
AA000S3H 201404 B
AA000S3H 201311 B
AA000W4M 201312 W
AA000W4M 201407 H
AA000W4M 201310 W
AA000W4M 201401 H
AA000W4M 201406 W
AA000W4M 201311 W
Here is a small sample of the dataset to give you a better idea of what i want to do. Here you can see that the Race variable for child with the ID AA000S3H changes from month to month. Race should be "B" in each month.
ChildID BeneMonth Race
AA000S3H 201401 B
AA000S3H 201407 B
AA000S3H 201406 B
AA000S3H 201408 W
AA000S3H 201403 B
AA000S3H 201405 B
AA000S3H 201312 B
AA000S3H 201402 H
AA000S3H 201409 B
AA000S3H 201310 B
AA000S3H 201404 B
AA000S3H 201311 B
AA000W4M 201312 W
AA000W4M 201407 H
AA000W4M 201310 W
AA000W4M 201401 H
AA000W4M 201406 W
AA000W4M 201311 W
↧
↧
Gravity Model: reverse causality LEAD variable
Hi,
I am trying to test for potential reverse causality between RTAs using a gravity model.
RTA = 1 if exporter and importer have a RTA at year t.
The pairid is the distance between exporter and importer.
I would like to generate a lead variable capturing the future level of RTAs (in the next 4 years):
tsset pairid year
gen RTA_LEAD4 = f4.RTA
replace RTA_LEAD4 = 0 if RTA_LEAD4 == .
However, I received this error:
tsset pairid year
repeated time values within panel
I think this is because in my database trade flows are treated separately each way (exports and imports) so each pairid of countries is two times each year.
How could I generate the RTA_LEAD4 without changing my pairid?
Thanks!!
I am trying to test for potential reverse causality between RTAs using a gravity model.
RTA = 1 if exporter and importer have a RTA at year t.
The pairid is the distance between exporter and importer.
I would like to generate a lead variable capturing the future level of RTAs (in the next 4 years):
tsset pairid year
gen RTA_LEAD4 = f4.RTA
replace RTA_LEAD4 = 0 if RTA_LEAD4 == .
However, I received this error:
tsset pairid year
repeated time values within panel
I think this is because in my database trade flows are treated separately each way (exports and imports) so each pairid of countries is two times each year.
How could I generate the RTA_LEAD4 without changing my pairid?
Thanks!!
↧
inquire about capture command
I see a data code that starts:
Can anyone explain the second code for me?
Many thanks in advance!
Code:
clear * capture cd "~/Dropbox/Projects/The Demand for Status/Final_data_QJE" set more off
Many thanks in advance!
↧
How to split?
Dear All, I have this data set,
and wish to obtain the following result
Any suggestion is appreciated. Thanks.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str61 y "4.4%" "One Year Deposit Rate+3.25%" "Five Year Deposit Rate-2.25%" end
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str22 y1 str5 y2 "" "4.4%" "One Year Deposit Rate" "3.25%" "Five Year Deposit Rate" "2.25%" end
↧
inquire foreach list
I find that
in fact, there is no Gold variable in the dataset. Instead, this is Gold_benefits variable. It seems in the list in foreach, variable name can be truncated. Is my understanding correct?
Another problem is that I cannot run foreach loop in the do file. It always prompts "invalid syntax
r(198);
"
Many thanks in advance!
Code:
foreach x in Gold Platinum_upgrade Platinum_upgrade_merit {
Another problem is that I cannot run foreach loop in the do file. It always prompts "invalid syntax
r(198);
"
Many thanks in advance!
↧
↧
Is ROC curve for 3x3 table possible?
Dear All,
I have a confusing doubt in my mind. ROC curve for 3x3 or 2X3 table is possible?
If so,
1. Can any one please give some example and the hypothesis statements?
2. how the ROC curve will be in that case?
3. Stata codes for that ROC curve
In my case, I have blood sugar level (test) comparison with HbA1c (gold standard) in 3 categories like "normal" "Pre-Diabetic" and "Diabetic".
Please let me know the answer for the above mentioned questions.
Thanks a lot in advance.
I have a confusing doubt in my mind. ROC curve for 3x3 or 2X3 table is possible?
If so,
1. Can any one please give some example and the hypothesis statements?
2. how the ROC curve will be in that case?
3. Stata codes for that ROC curve
In my case, I have blood sugar level (test) comparison with HbA1c (gold standard) in 3 categories like "normal" "Pre-Diabetic" and "Diabetic".
Please let me know the answer for the above mentioned questions.
Thanks a lot in advance.
↧
panel data with three variables
Hi everybody.
I have a panel data with three variables: year, country, product. I want to run a logit model so first of all I have to set my data. As I am a beginner, I don't know how I can set my pannel data with three variables.
I have a panel data with three variables: year, country, product. I want to run a logit model so first of all I have to set my data. As I am a beginner, I don't know how I can set my pannel data with three variables.
↧
I am working on systematic review and meta-analysis on clinical data( i.e. Antimicrobial resistance).
I have already completed the manuscript. The primary outcome of the article is to describe the pooled estimate of the different clinical data describing antimicrobial resistance in Ethiopia.
Unfortunately, my attempt to the "Metaprop" estimate using the dialogue box is erroneous and finally, the manuscript is rejected.
I need special support on how to perform "Metaprop" using the dialogue box so that to depict using the forest plot.
thanks, to your assistance!
Unfortunately, my attempt to the "Metaprop" estimate using the dialogue box is erroneous and finally, the manuscript is rejected.
I need special support on how to perform "Metaprop" using the dialogue box so that to depict using the forest plot.
thanks, to your assistance!
↧
window fopen DIRECTORY
Stata/MP 16.0 for Windows (64-bit x86-64) Revision 08 Jan 2020
Microsoft Windows [Version 10.0.17763.973]
When window fopen is used in a program, it seems the window fopen is exexuted in the directory of the ado and not in the current directory. (have I missed some obvious options etc?)
The above will "list" the files from the directory of the ado, not from the current directory.
Microsoft Windows [Version 10.0.17763.973]
When window fopen is used in a program, it seems the window fopen is exexuted in the directory of the ado and not in the current directory. (have I missed some obvious options etc?)
Code:
prog define fopen , nclass window fopen macroname "title" "*.*" end
↧
↧
MIMIC (SEM) Models with panel (longitudinal) data
IS it possible to estimate a MIMIC (SEM) model for a panel of countries and years? Could I replicate using STATA theTable 6 of this paper: Dybka, P., Kowalczuk, M., Olesiński, B., Torój, A., & Rozkrut, M. (2019). Currency demand and MIMIC models: towards a structured hybrid method of measuring the shadow economy. International Tax and Public Finance, 26(1), 4-40.
Thanks
I
Thanks
I
↧
Replace some observation for String variable
Hello,
Assume I have two variables Region and Oceania with large observation. Example:
I'd like to change the "Australia" and "New Zealand" to "Oceania" by using this command.:
replace Region = "Oceania" if Oceania = 1.
But this doesn't work. Can somebody help me out from this problem?
Thank You
Assume I have two variables Region and Oceania with large observation. Example:
Region | Oceania |
North America | 0 |
Asia | 0 |
Europe | 0 |
Australia | 1 |
New Zealand | 1 |
replace Region = "Oceania" if Oceania = 1.
But this doesn't work. Can somebody help me out from this problem?
Thank You
↧
Phillips and Sul technique
Dear all,
I am using the Phillips and Sul (2007) technique for convergence test and club identification for my PhD research. My problem is that sometimes, for one of the clubs identified (with the psecta and default options), the TStat in the output table is below the threshold value of -1.65, whereas if I use the adjusted method (Schnurbus, 2017) with the same dataset, therefore including the 'adjust' command specification, the results are ok. On the contrary, when using a different dataset, the same problem (with the TStat value below -1.65 for one of the clubs) happens when using the adjusted method, while the other method is ok.
How can I handle the issue? Why does it happen? Should I use some specific options or just discard the anomalous clustering obtained and choose the alternative method? How can I motivate this in my PhD research?
Thank you very much in advance for your reply, I really hope you can help me!
I am using the Phillips and Sul (2007) technique for convergence test and club identification for my PhD research. My problem is that sometimes, for one of the clubs identified (with the psecta and default options), the TStat in the output table is below the threshold value of -1.65, whereas if I use the adjusted method (Schnurbus, 2017) with the same dataset, therefore including the 'adjust' command specification, the results are ok. On the contrary, when using a different dataset, the same problem (with the TStat value below -1.65 for one of the clubs) happens when using the adjusted method, while the other method is ok.
How can I handle the issue? Why does it happen? Should I use some specific options or just discard the anomalous clustering obtained and choose the alternative method? How can I motivate this in my PhD research?
Thank you very much in advance for your reply, I really hope you can help me!
↧
Cluster standard error for random effect logit model - without vce(bootstrap)?
[COLOR=rgba(0, 0, 0, 0.87)]Hello everyone,
I have an issue with Stata and I would be grateful for your support. [/COLOR]
I have an issue with Stata and I would be grateful for your support. [/COLOR]
[COLOR=rgba(0, 0, 0, 0.87)]I'm working with an unbalanced penal data and use the "random effect logit model".
By that I mean, I'm using the following command:
xtlogit dep_var indep_var, re vce(bootstrap, rep(50) bca)
My issue is that with the vce(bootstrap) command, Stata needs forever to give me some output. Is there maybe another way to get clusteres standard errors for this -xtlogit, re command.
Thank you in advance.
Best regards,
Yasemin [/COLOR]
By that I mean, I'm using the following command:
xtlogit dep_var indep_var, re vce(bootstrap, rep(50) bca)
My issue is that with the vce(bootstrap) command, Stata needs forever to give me some output. Is there maybe another way to get clusteres standard errors for this -xtlogit, re command.
Thank you in advance.
Best regards,
Yasemin [/COLOR]
↧
↧
How to resolve numeric overflow while performing xtset,fe in stata?
Dear all,
I am getting error r(1400): combinations results in numeric overflow; computations cannot proceed, while performing xtlogit, fe in stata with 5738 observations (about 1900 individuals X 3 rounds).
Please consider the following sample data set for this purpose
I set up the panel as follows:
however when I peformed
I got the following
from the original data set
The same regression with
returned the regression results in my original data set.
I am confused as to why with only 5738 observations I'm getting numeric overflow. Also, please suggest a way to resolve this problem.
Thanks and Regards
I am getting error r(1400): combinations results in numeric overflow; computations cannot proceed, while performing xtlogit, fe in stata with 5738 observations (about 1900 individuals X 3 rounds).
Please consider the following sample data set for this purpose
Code:
input str3 ID byte str1 round byte str1 hi byte str1 acc byte str1 inf byte str1 shock ID round hi acc inf shock 1. IN1 1 1 0 1 1 2. IN1 2 1 1 1 1 3. IN1 3 0 0 1 1 4. IN2 1 1 1 0 1 5. IN2 2 0 0 1 0 6. IN2 3 1 0 0 0 7. end . list +--------------------------------------+ | ID round hi acc inf shock | |--------------------------------------| 1. | IN1 1 1 0 1 1 | 2. | IN1 2 1 1 1 1 | 3. | IN1 3 0 0 1 1 | 4. | IN2 1 1 1 0 1 | 5. | IN2 2 0 0 1 0 | |--------------------------------------| 6. | IN2 3 1 0 0 0 | +--------------------------------------+
Code:
encode ID, gen(ID1) drop ID rename ID1 ID xtset round ID
Code:
xtlogit hi inf shock, fe
Code:
1,913 (group size) take 1,640 (# positives) combinations results in numeric overflow; computations cannot proceed r(1400)
The same regression with
Code:
xtlogit acc inf shock, fe
I am confused as to why with only 5738 observations I'm getting numeric overflow. Also, please suggest a way to resolve this problem.
Thanks and Regards
↧
i. vs c.
Could someone explain me what is the difference between i.variable and c.variable?
↧
ml maximize, technique(bhhh): option technique() not allowed
Hello,
I have some problems using the maximum likelihood command of Stata to estimate a probit model.
Here is a simplified example of my problem: I am interested in estimating the effect of past experienced stock market returns of households on their stock market participation (controlling for other household characteristics).
This is my likelihood function:
-----------------------------------------------------------------------------------------------------
capture program drop nlprobitlf_stock
program nlprobitlf_stock
version 11
args lnfj xb b k
tempvar xnl
global lambda = `k'
rmatcalc_stock
quietly generate double `xnl' = w
drop w
quietly replace `lnfj' = lnnormal(`xb'+`b'*`xnl') if ($ML_y1 == 1)
quietly replace `lnfj' = lnnormal(-`xb'-`b'*`xnl') if ($ML_y1 == 0)
end
----------------------------------------------------------------------------------------------------
where rmatcalc_stock is a function that calculates the weighted sum of the stock returns a household experienced during its lifetime. The weights depend on k, which should also be estimated within the probit model.
This is my code for the optimization problem:
--------------------------------------------------------------------------------------------------------------------------------
ml model lf nlprobitlf_stock (`dependent_var' = `controls') /b /k [pweight = wgt]
ml search /// search initial value
ml maximize, difficult technique (bhhh) nonrtolerance
---------------------------------------------------------------------------------------------------------------------------------
If I run this, I get the following error message:
initial: log pseudolikelihood = -2.388e+08
rescale: log pseudolikelihood = -3395333.6
rescale eq: log pseudolikelihood = -3385030.8
Iteration 0: log pseudolikelihood = -3385030.8
Iteration 1: log pseudolikelihood = -2737459.7 (not concave)
Iteration 2: log pseudolikelihood = -2685007.4
Iteration 3: log pseudolikelihood = -2681058.8 (not concave)
Iteration 4: log pseudolikelihood = -2680772 (not concave)
Iteration 5: log pseudolikelihood = -2680771.5
Iteration 6: log pseudolikelihood = -2680524.6
Iteration 7: log pseudolikelihood = -2680520.2
Iteration 8: log pseudolikelihood = -2680520.2 (backed up)
option technique() not allowed
Does anyone know where this error may come from? Thank you very much in advance!
I have some problems using the maximum likelihood command of Stata to estimate a probit model.
Here is a simplified example of my problem: I am interested in estimating the effect of past experienced stock market returns of households on their stock market participation (controlling for other household characteristics).
This is my likelihood function:
-----------------------------------------------------------------------------------------------------
capture program drop nlprobitlf_stock
program nlprobitlf_stock
version 11
args lnfj xb b k
tempvar xnl
global lambda = `k'
rmatcalc_stock
quietly generate double `xnl' = w
drop w
quietly replace `lnfj' = lnnormal(`xb'+`b'*`xnl') if ($ML_y1 == 1)
quietly replace `lnfj' = lnnormal(-`xb'-`b'*`xnl') if ($ML_y1 == 0)
end
----------------------------------------------------------------------------------------------------
where rmatcalc_stock is a function that calculates the weighted sum of the stock returns a household experienced during its lifetime. The weights depend on k, which should also be estimated within the probit model.
This is my code for the optimization problem:
--------------------------------------------------------------------------------------------------------------------------------
ml model lf nlprobitlf_stock (`dependent_var' = `controls') /b /k [pweight = wgt]
ml search /// search initial value
ml maximize, difficult technique (bhhh) nonrtolerance
---------------------------------------------------------------------------------------------------------------------------------
If I run this, I get the following error message:
initial: log pseudolikelihood = -2.388e+08
rescale: log pseudolikelihood = -3395333.6
rescale eq: log pseudolikelihood = -3385030.8
Iteration 0: log pseudolikelihood = -3385030.8
Iteration 1: log pseudolikelihood = -2737459.7 (not concave)
Iteration 2: log pseudolikelihood = -2685007.4
Iteration 3: log pseudolikelihood = -2681058.8 (not concave)
Iteration 4: log pseudolikelihood = -2680772 (not concave)
Iteration 5: log pseudolikelihood = -2680771.5
Iteration 6: log pseudolikelihood = -2680524.6
Iteration 7: log pseudolikelihood = -2680520.2
Iteration 8: log pseudolikelihood = -2680520.2 (backed up)
option technique() not allowed
Does anyone know where this error may come from? Thank you very much in advance!
↧