Just installed Stata, where do I start?

January 31, 2020, 11:55 am

≫ Next: Rename a List of Variables by Extracting From Old Names

What resources do you recommend? What advice would you give to help save time and optimize learning? What projects lend themselves best to Stata?

↧

Rename a List of Variables by Extracting From Old Names

January 31, 2020, 12:18 pm

≫ Next: How to use putdocx table?

≪ Previous: Just installed Stata, where do I start?

Dear All,

I'm trying to rename a list of variables, the original names are like:

Code:

Australia_a_b, Brazil_c_d, Congo_e_f, Chile_s_j

And I'm trying to rename the variables to only the country names (extract the country name from the original names):

Code:

Australia, Brazil, Congo, Chile

I tried to use the "substr" function, but I wasn't able to find a method to detect the index of the first "_" in the original names.

Will appreciate if I could help on this!

Many thanks,
Craig

↧

How to use putdocx table?

January 31, 2020, 1:07 pm

≫ Next: Generating group ids from more groups

≪ Previous: Rename a List of Variables by Extracting From Old Names

Hi I'm having trouble with a putdocx command

I would like to put on word a table like that :

macros:
r(name3) : "GS >=3+4"
r(name2) : "GS<=3+3"
r(name1) : "no biopsy"

matrices:
r(Stat3) : 8 x 6
r(Stat2) : 8 x 6
r(Stat1) : 8 x 6
r(StatTotal) : 8 x 6

Using
. tabstat psa trusprosvol fk_score ex_score likelihood_hg, by(GGtot_neg_gg1) statistics (count mean sd min p25 p50 p75 max) columns(statistics) save

I'm using the following codes:

"putdocx clear
putdocx begin

tabstat psa trusprosvol fk_score ex_score likelihood_hg, by(GGtot_neg_gg1) statistics (count mean sd min p25 p50 p75 max) columns(statistics) save

return list
matrix LabResults1 = r(StatTotal)'

matrix LabResults2 = r(stat3)
matrix LabResults3 = r(stat2)
matrix LabResults4 = r(stat1)

putdocx table Table1 = matrix(LabResults1) , rownames colnames
putdocx table Table2 = matrix (LabResults2)
putdocx table Table3 = matrix (LabResults3)
putdocx table Table4 = matrix (LabResults4)

putdocx save "table_stat1.docx", replace"

But I do not have only the fist table and then 3 empty cells.

Can you help me , please

↧

Generating group ids from more groups

January 31, 2020, 2:09 pm

≫ Next: cleaning data

≪ Previous: How to use putdocx table?

Hello all,

I am cleaning a data base (my own field work in Colombia, yey!). I have information on the state, the county and on the village each for each household.

Currently I have categorized the states (there are 4) so an id of 1-4. I have categorized the counties (there are 12) so 1-12. Now I want to categorize the villages (around 300 of them) that are currently string. The problem is some villages have the same names. Some are in different states (which is not the issue), but some are in the same state but different counties. The data looks a little like this:

State	County	Village
1	2	Las Palmas
1	2	San Lorenzo
1	4	Las Palmas
2	1	Cristales
2	1	Sardeña
3	3	Cristales
3	6	Puerto Legio
4	6	Maria Helena
4	12	Fonda

I want to turn the villages into number (or an identifier), but as you can see Las Palmas of state 1 and county 2 is not the same las palmas from state 1 and county 4. Additionally, The Cristales from state 2 is not the same as the Cristales in state 3.

Is there an easy way to do this by bysort?

Thank you!

↧

cleaning data

January 31, 2020, 2:57 pm

≫ Next: Gravity Model: reverse causality LEAD variable

≪ Previous: Generating group ids from more groups

I have a very large data set with over 100,000 observations. one of the variables is Race and another is ChildID and Month. I want to make sure that in each month the Race for the ChildID is the same (does not change from month to month). Do you have a suggestion for how i can do a data check on the Race variable?

Here is a small sample of the dataset to give you a better idea of what i want to do. Here you can see that the Race variable for child with the ID AA000S3H changes from month to month. Race should be "B" in each month.
ChildID BeneMonth Race
AA000S3H 201401 B
AA000S3H 201407 B
AA000S3H 201406 B
AA000S3H 201408 W
AA000S3H 201403 B
AA000S3H 201405 B
AA000S3H 201312 B
AA000S3H 201402 H
AA000S3H 201409 B
AA000S3H 201310 B
AA000S3H 201404 B
AA000S3H 201311 B
AA000W4M 201312 W
AA000W4M 201407 H
AA000W4M 201310 W
AA000W4M 201401 H
AA000W4M 201406 W
AA000W4M 201311 W

↧

Gravity Model: reverse causality LEAD variable

January 31, 2020, 5:35 pm

≫ Next: inquire about capture command

≪ Previous: cleaning data

Hi,

I am trying to test for potential reverse causality between RTAs using a gravity model.

RTA = 1 if exporter and importer have a RTA at year t.
The pairid is the distance between exporter and importer.

I would like to generate a lead variable capturing the future level of RTAs (in the next 4 years):

tsset pairid year
gen RTA_LEAD4 = f4.RTA
replace RTA_LEAD4 = 0 if RTA_LEAD4 == .

However, I received this error:

tsset pairid year
repeated time values within panel

I think this is because in my database trade flows are treated separately each way (exports and imports) so each pairid of countries is two times each year.

How could I generate the RTA_LEAD4 without changing my pairid?

Thanks!!

↧

inquire about capture command

January 31, 2020, 7:24 pm

≫ Next: How to split?

≪ Previous: Gravity Model: reverse causality LEAD variable

I see a data code that starts:

Code:

clear *
capture cd "~/Dropbox/Projects/The Demand for Status/Final_data_QJE"
set more off

Can anyone explain the second code for me?

Many thanks in advance!

↧

How to split?

January 31, 2020, 7:42 pm

≫ Next: inquire foreach list

≪ Previous: inquire about capture command

Dear All, I have this data set,

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str61 y
"4.4%"                    
"One Year Deposit Rate+3.25%"
"Five Year Deposit Rate-2.25%" 
end

and wish to obtain the following result

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str22 y1 str5 y2
""                       "4.4%" 
"One Year Deposit Rate"  "3.25%"
"Five Year Deposit Rate" "2.25%"
end

Any suggestion is appreciated. Thanks.

↧

inquire foreach list

January 31, 2020, 9:02 pm

≫ Next: Is ROC curve for 3x3 table possible?

≪ Previous: How to split?

I find that

Code:

foreach x in  Gold Platinum_upgrade Platinum_upgrade_merit {

in fact, there is no Gold variable in the dataset. Instead, this is Gold_benefits variable. It seems in the list in foreach, variable name can be truncated. Is my understanding correct?

Another problem is that I cannot run foreach loop in the do file. It always prompts "invalid syntax
r(198);
"

Many thanks in advance!

↧

Is ROC curve for 3x3 table possible?

January 31, 2020, 10:14 pm

≫ Next: panel data with three variables

≪ Previous: inquire foreach list

Dear All,
I have a confusing doubt in my mind. ROC curve for 3x3 or 2X3 table is possible?
If so,
1. Can any one please give some example and the hypothesis statements?
2. how the ROC curve will be in that case?
3. Stata codes for that ROC curve

In my case, I have blood sugar level (test) comparison with HbA1c (gold standard) in 3 categories like "normal" "Pre-Diabetic" and "Diabetic".

Please let me know the answer for the above mentioned questions.

Thanks a lot in advance.

↧

panel data with three variables

February 1, 2020, 1:13 am

≫ Next: I am working on systematic review and meta-analysis on clinical data( i.e. Antimicrobial resistance).

≪ Previous: Is ROC curve for 3x3 table possible?

Hi everybody.

I have a panel data with three variables: year, country, product. I want to run a logit model so first of all I have to set my data. As I am a beginner, I don't know how I can set my pannel data with three variables.

↧

I am working on systematic review and meta-analysis on clinical data( i.e. Antimicrobial resistance).

February 1, 2020, 1:18 am

≫ Next: window fopen DIRECTORY

≪ Previous: panel data with three variables

I have already completed the manuscript. The primary outcome of the article is to describe the pooled estimate of the different clinical data describing antimicrobial resistance in Ethiopia.
Unfortunately, my attempt to the "Metaprop" estimate using the dialogue box is erroneous and finally, the manuscript is rejected.
I need special support on how to perform "Metaprop" using the dialogue box so that to depict using the forest plot.

thanks, to your assistance!

↧

window fopen DIRECTORY

February 1, 2020, 1:44 am

≫ Next: MIMIC (SEM) Models with panel (longitudinal) data

≪ Previous: I am working on systematic review and meta-analysis on clinical data( i.e. Antimicrobial resistance).

Stata/MP 16.0 for Windows (64-bit x86-64) Revision 08 Jan 2020
Microsoft Windows [Version 10.0.17763.973]

When window fopen is used in a program, it seems the window fopen is exexuted in the directory of the ado and not in the current directory. (have I missed some obvious options etc?)

Code:

prog define fopen , nclass

window fopen macroname "title" "*.*"

end

The above will "list" the files from the directory of the ado, not from the current directory.

↧

MIMIC (SEM) Models with panel (longitudinal) data

February 1, 2020, 1:55 am

≫ Next: Replace some observation for String variable

≪ Previous: window fopen DIRECTORY

IS it possible to estimate a MIMIC (SEM) model for a panel of countries and years? Could I replicate using STATA theTable 6 of this paper: Dybka, P., Kowalczuk, M., Olesiński, B., Torój, A., & Rozkrut, M. (2019). Currency demand and MIMIC models: towards a structured hybrid method of measuring the shadow economy. International Tax and Public Finance, 26(1), 4-40.
Thanks

I

↧

Replace some observation for String variable

February 1, 2020, 1:59 am

≫ Next: Phillips and Sul technique

≪ Previous: MIMIC (SEM) Models with panel (longitudinal) data

Hello,
Assume I have two variables Region and Oceania with large observation. Example:

Region	Oceania
North America	0
Asia	0
Europe	0
Australia	1
New Zealand	1

I'd like to change the "Australia" and "New Zealand" to "Oceania" by using this command.:
replace Region = "Oceania" if Oceania = 1.

But this doesn't work. Can somebody help me out from this problem?
Thank You

↧

Phillips and Sul technique

February 1, 2020, 4:34 am

≫ Next: Cluster standard error for random effect logit model - without vce(bootstrap)?

≪ Previous: Replace some observation for String variable

Dear all,

I am using the Phillips and Sul (2007) technique for convergence test and club identification for my PhD research. My problem is that sometimes, for one of the clubs identified (with the psecta and default options), the TStat in the output table is below the threshold value of -1.65, whereas if I use the adjusted method (Schnurbus, 2017) with the same dataset, therefore including the 'adjust' command specification, the results are ok. On the contrary, when using a different dataset, the same problem (with the TStat value below -1.65 for one of the clubs) happens when using the adjusted method, while the other method is ok.

How can I handle the issue? Why does it happen? Should I use some specific options or just discard the anomalous clustering obtained and choose the alternative method? How can I motivate this in my PhD research?

Thank you very much in advance for your reply, I really hope you can help me!

↧

Cluster standard error for random effect logit model - without vce(bootstrap)?

February 1, 2020, 5:22 am

≫ Next: How to resolve numeric overflow while performing xtset,fe in stata?

≪ Previous: Phillips and Sul technique

[COLOR=rgba(0, 0, 0, 0.87)]Hello everyone,
I have an issue with Stata and I would be grateful for your support. [/COLOR]

[COLOR=rgba(0, 0, 0, 0.87)]I'm working with an unbalanced penal data and use the "random effect logit model".
By that I mean, I'm using the following command:
xtlogit dep_var indep_var, re vce(bootstrap, rep(50) bca)

My issue is that with the vce(bootstrap) command, Stata needs forever to give me some output. Is there maybe another way to get clusteres standard errors for this -xtlogit, re command.

Thank you in advance.

Best regards,
Yasemin [/COLOR]

↧

How to resolve numeric overflow while performing xtset,fe in stata?

February 1, 2020, 6:36 am

≫ Next: i. vs c.

≪ Previous: Cluster standard error for random effect logit model - without vce(bootstrap)?

Dear all,
I am getting error r(1400): combinations results in numeric overflow; computations cannot proceed, while performing xtlogit, fe in stata with 5738 observations (about 1900 individuals X 3 rounds).
Please consider the following sample data set for this purpose

Code:

 input str3 ID byte str1 round byte str1 hi byte str1 acc byte str1 inf byte str1 shock

            ID      round         hi        acc        inf      shock
  1. IN1 1 1 0 1 1
  2. IN1 2 1 1  1 1
  3. IN1 3 0 0 1 1
  4. IN2 1 1 1 0 1
  5. IN2 2 0 0 1 0
  6. IN2 3 1 0 0 0
  7. end

. list

     +--------------------------------------+
     |  ID   round   hi   acc   inf   shock |
     |--------------------------------------|
  1. | IN1       1    1     0     1       1 |
  2. | IN1       2    1     1     1       1 |
  3. | IN1       3    0     0     1       1 |
  4. | IN2       1    1     1     0       1 |
  5. | IN2       2    0     0     1       0 |
     |--------------------------------------|
  6. | IN2       3    1     0     0       0 |
     +--------------------------------------+

I set up the panel as follows:

Code:

encode ID, gen(ID1)
drop ID
rename ID1 ID
xtset round ID

however when I peformed

Code:

xtlogit hi inf shock, fe

I got the following

Code:

1,913 (group size) take 1,640 (# positives) combinations results in numeric overflow; computations cannot proceed r(1400)

from the original data set

The same regression with

Code:

xtlogit acc inf shock, fe

returned the regression results in my original data set.

I am confused as to why with only 5738 observations I'm getting numeric overflow. Also, please suggest a way to resolve this problem.

Thanks and Regards

↧

i. vs c.

February 1, 2020, 7:03 am

≫ Next: ml maximize, technique(bhhh): option technique() not allowed

≪ Previous: How to resolve numeric overflow while performing xtset,fe in stata?

Could someone explain me what is the difference between i.variable and c.variable?

↧

ml maximize, technique(bhhh): option technique() not allowed

February 1, 2020, 7:36 am

≫ Next: Different output from estimates table when using stored estimates

≪ Previous: i. vs c.

Hello,

I have some problems using the maximum likelihood command of Stata to estimate a probit model.
Here is a simplified example of my problem: I am interested in estimating the effect of past experienced stock market returns of households on their stock market participation (controlling for other household characteristics).

This is my likelihood function:
-----------------------------------------------------------------------------------------------------
capture program drop nlprobitlf_stock
program nlprobitlf_stock
version 11
args lnfj xb b k
tempvar xnl

global lambda = `k'

rmatcalc_stock

quietly generate double `xnl' = w
drop w

quietly replace `lnfj' = lnnormal(`xb'+`b'*`xnl') if ($ML_y1 == 1)
quietly replace `lnfj' = lnnormal(-`xb'-`b'*`xnl') if ($ML_y1 == 0)

end
----------------------------------------------------------------------------------------------------
where rmatcalc_stock is a function that calculates the weighted sum of the stock returns a household experienced during its lifetime. The weights depend on k, which should also be estimated within the probit model.

This is my code for the optimization problem:
--------------------------------------------------------------------------------------------------------------------------------
ml model lf nlprobitlf_stock (`dependent_var' = `controls') /b /k [pweight = wgt]
ml search /// search initial value
ml maximize, difficult technique (bhhh) nonrtolerance
---------------------------------------------------------------------------------------------------------------------------------
If I run this, I get the following error message:

initial: log pseudolikelihood = -2.388e+08
rescale: log pseudolikelihood = -3395333.6
rescale eq: log pseudolikelihood = -3385030.8
Iteration 0: log pseudolikelihood = -3385030.8
Iteration 1: log pseudolikelihood = -2737459.7 (not concave)
Iteration 2: log pseudolikelihood = -2685007.4
Iteration 3: log pseudolikelihood = -2681058.8 (not concave)
Iteration 4: log pseudolikelihood = -2680772 (not concave)
Iteration 5: log pseudolikelihood = -2680771.5
Iteration 6: log pseudolikelihood = -2680524.6
Iteration 7: log pseudolikelihood = -2680520.2
Iteration 8: log pseudolikelihood = -2680520.2 (backed up)
option technique() not allowed

Does anyone know where this error may come from? Thank you very much in advance!

↧