Quantcast
Channel: Statalist
Viewing all 72802 articles
Browse latest View live

Problem with ml convergence

$
0
0
This is my first time using Stata for maximum likelihood estimation, and I'm encountering a problem where my ml evaluator will not converge.

I am using a parametric specification for bid distributions based on Athey, Levin, and Seira (2011). My dataset has one observation per auction with variables for the value of the highest bid, second highest bid, etc. These variables have missing values if a bid of that rank was not submitted in that auction.

The log-likelihood function for auction t is:
\[ \ln L_t\left(\rho,\lambda,\theta\right) = N_t \ln\theta + \ln\Gamma\left(\frac{1}\theta+N_t\right)-\ln\Gamma\left(\frac{1}\theta\right)+\sum_{i=1}^{N _t}\ln\left(\rho\lambda\left(\frac{b _{it}}\lambda\right)^{\rho-1}\right)+\left(\frac{1}\theta+N_t\right)\ln\left( 1+\theta\sum_{i=1}^{N_t}\left(\frac{b_{it}}\lambda \right)^\rho\right) \]

N_t is the number of bidders in auction t and gamma is the gamma function.

Here's the ml program I wrote. $ML_y1 through $ML_y12 are the top 12 bids, and $ML_y13 is the number of bidders:
Code:
program gammaweibull
 args lnf lambda rho theta
 tempvar ncons nmult lsumpdf sumcdf lsumcdf ///
   ld1 ld2 ld3 ld4 ld5 ld6 ld7 ld8 ld9 ld10 ld11 ld12
 quietly {
  gen double `ncons' = $ML_y13 * ln(`theta') + lngamma(1/`theta' + $ML_y13 ) - lngamma(1/`theta')
  gen double `nmult' = 1/`theta' + $ML_y13
  
  *If bid values are missing from an auction, then the program observes ln(0)...
  *To correct for this, manually replace additional term with 0 for missing observations
  gen double `ld1'  = ln(`rho'*`lambda'*($ML_y1 /`lambda')^(`rho'-1))
  gen double `ld2'  = `ld1'  + ln(`rho'*`lambda'*($ML_y2  /`lambda')^(`rho'-1))
  replace    `ld2'  = `ld1'  if $ML_y2  ==.
  gen double `ld3'  = `ld2'  + ln(`rho'*`lambda'*($ML_y3  /`lambda')^(`rho'-1))
  replace    `ld3'  = `ld2'  if $ML_y3  ==.
  gen double `ld4'  = `ld3'  + ln(`rho'*`lambda'*($ML_y4  /`lambda')^(`rho'-1))
  replace    `ld4'  = `ld3'  if $ML_y4  ==.
  gen double `ld5'  = `ld4'  + ln(`rho'*`lambda'*($ML_y5  /`lambda')^(`rho'-1))
  replace    `ld5'  = `ld4'  if $ML_y5  ==.
  gen double `ld6'  = `ld5'  + ln(`rho'*`lambda'*($ML_y6  /`lambda')^(`rho'-1))
  replace    `ld6'  = `ld5'  if $ML_y6  ==.
  gen double `ld7'  = `ld6'  + ln(`rho'*`lambda'*($ML_y7  /`lambda')^(`rho'-1))
  replace    `ld7'  = `ld6'  if $ML_y7  ==.
  gen double `ld8'  = `ld7'  + ln(`rho'*`lambda'*($ML_y8  /`lambda')^(`rho'-1))
  replace    `ld8'  = `ld7'  if $ML_y8  ==.
  gen double `ld9'  = `ld8'  + ln(`rho'*`lambda'*($ML_y9  /`lambda')^(`rho'-1))
  replace    `ld9'  = `ld8'  if $ML_y9  ==.
  gen double `ld10' = `ld9'  + ln(`rho'*`lambda'*($ML_y10 /`lambda')^(`rho'-1))
  replace    `ld10' = `ld9'  if $ML_y10 ==.
  gen double `ld11' = `ld10' + ln(`rho'*`lambda'*($ML_y11 /`lambda')^(`rho'-1))
  replace    `ld11' = `ld10' if $ML_y11 ==.
  gen double `ld12' = `ld11' + ln(`rho'*`lambda'*($ML_y12 /`lambda')^(`rho'-1))
  replace    `ld12' = `ld11' if $ML_y12 ==.
  gen `lsumpdf' = `ld12'
  
  *Now if the bid is 0 then the additional terms will be 0, so we only need one variable
  gen double `sumcdf'  = ($ML_y1  / `lambda')^(`rho') ///
           + ($ML_y2  / `lambda')^(`rho') ///
           + ($ML_y3  / `lambda')^(`rho') ///
           + ($ML_y4  / `lambda')^(`rho') ///
           + ($ML_y5  / `lambda')^(`rho') ///
           + ($ML_y6  / `lambda')^(`rho') ///
           + ($ML_y7  / `lambda')^(`rho') ///
           + ($ML_y8  / `lambda')^(`rho') ///
           + ($ML_y9  / `lambda')^(`rho') ///
           + ($ML_y10 / `lambda')^(`rho') ///
           + ($ML_y11 / `lambda')^(`rho') ///
           + ($ML_y12 / `lambda')^(`rho')  
  gen double `lsumcdf'  = ln(1+ `theta'*`sumcdf' )
  
  replace `lnf' = `ncons'+`lsumpdf'+(`nmult'*`lsumcdf')
  }
end
When I use ml maximize I get this output:
Code:
. ml maximize

initial:       log likelihood =  134279.26
rescale:       log likelihood =  134279.26
rescale eq:    log likelihood =  141848.78
Iteration 0:   log likelihood =  141848.78  (not concave)
Iteration 1:   log likelihood =  141869.98  (not concave)
Iteration 2:   log likelihood =   141881.5  (not concave)
Iteration 3:   log likelihood =  141882.65  (not concave)
Iteration 4:   log likelihood =  141882.76  (not concave)
Iteration 5:   log likelihood =  141882.81  (not concave)
cannot compute an improvement -- discontinuous region encountered
r(430);
I get similar results when I use the "difficult" option. I tried writing a smaller version of the program for auctions with only three bidders and ran it on that sample and got the same results. I used ml query to look for the problem, and found this:
Code:
Current status
    Coefficient values
        1:                                   -1.5495e-08
        2:                                   -.000023924
        3:                                   265.716277
        4:                                   .000022557
        5:                                   108.852769
        6:                                   3.3868e+17
    Function value:                          .
    Converged:                               no
Coefficient 6 corresponds to theta. I've tried everything I can think of to achieve convergence, but haven't had any luck.

One other problem I have is that I want to use the number of bidders as an independent variable in one of my parameter equations, but Stata automatically omits it because of multicollinearity. Given the form of the likelihood function I don't see how multicollinearity would be a problem, but it could be that I'm missing something. Is there a way to get around this?

dropping consecutive values of a dummy, keeping the first after a 0

$
0
0
Dear all,

I have a treatment variable of which I only need the beginning of the treatment. The dummy is=1 for 1:n periods and I want to replace every consecutive observation after a 1. If there is a 0 in between followed by another 1, I want too keep that one: its a new treatment.

In some example data:
Code:
sysuse auto, clear
xtset mpg
gen dummy=1 if gear_ratio>3
replace dummy=0 if dummy==.
sort mpg
For example, line 65 and 66 or 69 should become 0's in the dummy. My problem is that I cannot work with lags because in my actual dataset I don't know how long the spells of dummy=1 are. So I need some sort of iterative approach. But as soon as I replace the first consecutive 1, I loose the condition on which to test. I think I'm making this much to complicated so I wanted to ask you for a little help.

Ideally, I want to do that in a panel so something like bysort mpg: egen newdummy but I think I can figure this part out on my own.

Thanks in advance!

p.s.: In R I would do something like this (admittedly inelegant, I'm no expert in either) with a count variable counting the 1's till a 0 appears (within each mpg), then dropping all counts>1. Is something like that possible in Stata?

Multiple Functions in Rolling Window Approach

$
0
0
Hi Stata Users,

I'm currently conducting my analysis on a rolling window basis. In EACH window, I need to estimate 4 residual series from 4 regression and these residual series will be used to generate 4 standardized variables (zero mean and unit variance). Then, I need to used these standardized variables to perform another regression analysis in the same window. I think a looping will solve my problem but I'm very poor in programming. I would greatly appreciate if anyone could help!

How to align two lines in the same plots?

$
0
0
Hi Experts,

I used the twoway scatter with line command to create 2 lines in one plots. However, there is a lag for these two lines. How can I make these 2 lines vertically aligned? Basically I want to move the read dash line to the left a little bit and make those dots align vertically. Please see my codes below. Thanks!

twoway ///
(scatter beta group if group==1 |group==2 , msymbol(S) mcolor(black)) ///
(scatter beta group if group==3 |group==4 , msymbol(Oh) mcolor(black)) ///
(scatter beta group if group==5 |group==6 , msymbol(T) mcolor(black)) ///
(scatter beta group if group==7 |group==8 , msymbol(D) mcolor(black)) ///
(line beta group if group==1 |group==3 | group==5 |group==7, lcolor(black) lwidth(thin)) ///
(line beta group if group==2 |group==4 | group==6 |group==8, lpattern(dash) lcolor(red) lwidth(thin)) ///
, ///
xlabel(none) ///
xtitle("", size(large)) ///
ytitle("β", size(large)) ///
ylab(-0.02(0.04) 0.16) ///
legend(order (1 "unadjusted" 2 "adj1" 3 "adj2" 4 "adj3") position(1) row(1)) ///
graphregion(fcolor(white) lcolor(white) ifcolor(white) ilcolor(white)) ///
note("line: separate dash: multivariate", size(small) ring(0) position(5)) ///
name(g10_full, replace)
Array

Low Wald chi-square values in GEE

$
0
0
Hello everyone,

I am running a GEE -xtgee- model with an identity link function, a Gaussian (normal) distribution and AR(1) correlation. Furthermore, I use robust variance estimators to control for heteroscedasticity. In order to account for fixed effects I specified the panel with -xtset- based on company and year.

My problem is that I receive relatively small Wald chi-square values which are mostly non-significant. While this might certainly be an indicator that the model is simply not representative and not explaining the DV, I am wondering if there might be other reasons for these low Wald chi-square values. For example, are there specific issues with some of the variables included? Or are there any other ways to optimize the Wald chi-square value?

Especially given the fact that the number of observations as well as the number of variables included are relatively large, the low Wald chi-square values surprise me. Any info or help on this issue would therefore be really appreciated.

Code:
GEE population-averaged model                   Number of obs     =      1,125
Group and time vars:           gvkey fyear      Number of groups  =        161
Link:                             identity      Obs per group:
Family:                           Gaussian                    min =          2
Correlation:                         AR(1)                    avg =        7.0
                                                              max =          9
                                                Wald chi2(21)     =      25.71
Scale parameter:                  1.600558      Prob > chi2       =     0.2178

                                               (Std. Err. adjusted for clustering on gvkey)
-------------------------------------------------------------------------------------------
                          |               Robust
                       EO |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------------------+----------------------------------------------------------------
           Median_TMT_Age |  -.0783608   .0455389    -1.72   0.0853    -.1676154    .0108937
                 TMT_Size |    .033364   .0317819     1.05   0.2938    -.0289274    .0956555
                 Firm_Age |   .0357644   .0809936     0.44   0.6588    -.1229801     .194509
                Firm_Size |   .1577051   .0768751     2.05   0.0402     .0070328    .3083775
          Financial_Slack |   .0733506   .0434145     1.69   0.0911    -.0117403    .1584414
         Past_Performance |   .0175702    .034472     0.51   0.6103    -.0499936     .085134
   Environmental_Dynamism |   .0145041   .0679373     0.21   0.8309    -.1186506    .1476587
    Competitive_Intensity |    .033028   .0658973     0.50   0.6162    -.0961284    .1621844
                          |
                    fyear |
                    2007  |  -.1792236   .1806779    -0.99   0.3212    -.5333458    .1748986
                    2008  |  -.2379827   .1927414    -1.23   0.2169    -.6157488    .1397835
                    2009  |   -.353077   .2056169    -1.72   0.0860    -.7560788    .0499249
                    2010  |  -.3551309    .213963    -1.66   0.0970    -.7744908    .0642289
                    2011  |  -.2732624   .2210608    -1.24   0.2164    -.7065336    .1600088
                    2012  |  -.1856549   .2277244    -0.82   0.4149    -.6319866    .2606767
                    2013  |  -.1176107   .2245817    -0.52   0.6005    -.5577827    .3225612
                    2014  |  -.1420228   .2292282    -0.62   0.5355    -.5913018    .3072561
                          |
                    sic_1 |
            Construction  |          0  (omitted)
           Manufacturing  |  -.2285387   .2568444    -0.89   0.3736    -.7319444    .2748671
Transportation/Utilities  |  -.0719184   .3129316    -0.23   0.8182     -.685253    .5414163
        Retail/Wholesale  |  -.2857057   .3145684    -0.91   0.3637    -.9022484    .3308371
                 Finance  |   .1264767   .2891284     0.44   0.6618    -.4402046     .693158
                Services  |  -.0735055    .311783    -0.24   0.8136    -.6845888    .5375779
                          |
                    _cons |   .3444924   .3007342     1.15   0.2520    -.2449357    .9339205
-------------------------------------------------------------------------------------------
Thanks in advance.

Christian

Repeated time values within panel

$
0
0
Hello People.
I get this error message " repeated time values withen panel" when I try to set my data as panel data.
I checked for duplicates and it seems that there are many duplicates in my data.
Anybody can help me how to get this issue solved?


Thanks

Code Needed for Person-Year Identification. After Consecutive Years Above a Certain Level, 5 Years later a Specific Decrease Occurs.

$
0
0
Apologies, confidential dataset.

I'm looking for a string of code that would solve the problem described below. I have no idea where to start since deleting data won't work, and since the decrease in data 5 years later can still be above 0.5. Here ya go!

Assume there is a psychological study measuring a group of peoples emotional intelligence (EI) over time, rated on a scale from 0 to 1. From 0 to 1, there are 11 ratings (0, 0.1, 0.2, ... 0.9, 1). There are 200 people, with their EI reevaluated each year from 1960 to 2010. My objective is to find the person-years where the preceding 10 years had an EI consistently greater than 0.5 (all 10 years had EI>0.5), and 5 years later the EI had decreased by 0.2 or more.


Examples:

1.

1. EI of person A from 1960-1974 = 0.6, then 1975-1978 = 0.5, then 1979 = 0.4 would COUNT
2. Same 1960-1977 as above, but 1978 = 0.4, 1979 = 0.5 would COUNT
3. 1960-1969 = 0.6, 1970-1973 = 0.4, 1974 = 0.5 would NOT COUNT

#1 above would need to identify 1975, person A as the person-year.
#2 above would need to identify 1974 as the year, since the 5 year decrease occurred in 1978, but 1979 did not have the same result.
#3 above had no 10 year run greater than 0.5 where 5 years later there was a 0.2 decrease. The only year that could fit this qualification would be 1974 since there was exactly a 10 year run above 0.5 that ended exactly 5 years before.



2.

EI of person B from 1960-1969 = 0.8, 1970-1973 = 0.7, 1974 = 0.6
OR
1960-1969 = 0.8, 1970-1974 = 0.6
OR
1960-1969 = 0.8, 1970-1974 = 0.3
OR
1960-1969 = 0.8, 1970-1973 = 0.3, 1974 = 0.6
ALL WOULD COUNT. They should all return 1970, person B as the person-year.



Any advice on the code for this would be greatly appreciated!! Life of a research assistant.

Importing programs from R to STATA

$
0
0
Hello.

I was wondering if someone could tell me how to import the GNM package from R to STATA. Specifically I'm interested in the Diagonal Reference Function (DREF) in order to use the Diagonal Reference Models (DRMs).

Tank you!

Problem with "osample" from "teffects nnmatch"

$
0
0
Dear Statalist,

I am sending this question again, because I didn't get an answer last time I wrote: My team and I are running a propensity score matching analysis on agricultural data in order to assess the impact of credit in several farmer outcomes. For this, we are using the command teffects nnmatch, that allows to perform the matching exactly on the observable covariates, which is useful for categorical variables for example.

Anyways, we are now getting an error after running the command, which is similar to the one of this thread:http://www.statalist.org/forums/foru...match-question

We first run a line of code that looks like the following:

Code:
teffects nnmatch (dep_variable x_covariate) (treatment)
which gives the following error message:
no exact matches for observation 23472; use option osample() to identify all observations with deficient matches
We then include the option osample:
Code:
teffects nnmatch (dep_variable x_covariate) (treatment) , osample(newvar)
.

After Stata identifies the observations for which an exact match cant be found (they have a 1 in newvar). We do the following:


Code:
teffects nnmatch (dep_variable x_covariate) (treatment) if newvar == 0
Unfortunately, we end up with an error message similar to when we first ran the command, that is, the Stata claims it didnt find exact matches for a certain observation, different than the one at the beginning. Does anybody know a solution for this problem?


Best regards,


Juan Hernández

dow-dummy omitted because of collinearity

$
0
0
Dear all,

I have panel data for 225 companies for their daily stock returns over 3 years. I tried to regress 12 lags of each return on the return itself together with 5 day of the week dummy variabels (Monday (mon) till Friday (fri)). Together with the regression results I get the following note reported:
" fri omitted because of collinearity".


This is however something that doesn't make sense to me intuitively and statistically as the correlation matrix doesn't show any correlations above 0,3 between friday and the rest of the independent variables or any other combination of variables. Is there a way to resolve this issue or to tell stata to not drop this variable in the regression?

I used the following code:
Code:
Code:
foreach indepvar in idr{ reg logr logr_1 logr_2 logr_3 logr_4 logr_5 logr_6 logr_7 logr_8 logr_9 logr_10 logr_11 logr_12 mon tues wed thur fri, vce(robust) }
Best regards,

Gianni

Need Professional Help with Advanced Survival Analysis techniques

$
0
0
Hello All,
I am trying to find out if there is anyone to assist with advanced survival techniques.
I will compensate them financially for their professional services.
I can be reached at meetanwar@gmail.com
Thank you,
Anwar

using the averageif function with "<>" in STATA

$
0
0
I want to create a variable avgcal for a specific brand which gives the average of calories of all other brands:

Brand name Calories

Cornflakes 100
Rice krispies 50
Frosted flake 120

So for cornflakes, i should get (50+120)/2, for rice krispies (100+120))/2 and for frosted flake (100+50)/2. I need a command that will generate a column with these numbers for all the brands. Please help.

Conditional logit

$
0
0
HI All,

I am facing a problem in running the conditional logit model. My data are on firm location choice which is dependent on firm specific characteristics which do not vary across alternatives and region specific characteristics which are firm invariant. However, the restaurant example in stata manual as well as the fishing mode example in Cameron and Trivedi show that costs and prices vary across alternatives as well as families/individuals respectively.

My problem is that the state specific characteristics do not vary across firms. For example, state gdp for a particular state is the same for all firms which locate to that state. The cross sectional data (hypothetical, meant for explanation only) in wide format are as follows:
Factory id State chosen fixed capital State GDP growth
1 A 1200 6.5
2 B 1900 5.7
3 C 500 7.0
4 D 2900 7.5
5 D 650 7.5
6 A 1090 6.5
Now, state GDP growth is the same corresponding to factory 1 and 6 as well as factory 4 and 5 since each pair locates to state A and D respectively. However,stata manual shows that alternative specific characteristics (state GDP in this example) also varies across individuals (here factory id). In the Cameron and Trivedi example on fishing mode, explanatory variables like price varies across alternatives (beach, pier, etc.) as well as individuals.

I have run the conditional logit model but I could not get results as stata says there is collinearity.

Gaurav.

Jenkins approach: Plotting hazard function for discrete-time analysis, graph remains empty

$
0
0
I am trying to produce a plot for my discrete-time hazard analysis. My dependent variable is 'event' and my variable of interest is a scale from 1 to 11 ('ls'). I am using the material by Prof. Jenkins, according to which I first need to predict the hazard and then plot it using twoway.

The problem: my graph is empty

So far, this is my explanation: I try to take the mean of all variables. However, the mean always has a lot of decimal points (for "agree" eg., -summarize- shows me a mean of 4.8567). If I put in the exact mean, -gen- still deletes all cases, maybe, because the mean actually has a lot more decimal points than -summarize- shows me. Is there a way I can -gen- "h0" taking the actual means?

What I want: In the end, i want ten lines (for ls=1 to ls=11), showing how the hazard changes over the duration (c).

C is my duration variable (with a range from 1 to 45). As you can see, so far I just plotted the first two lines, h0 and h1.

This is my code:

Code:
set more off
logit event ls i.c sex age isc_2 isc_3 agree & [...], or vce(cluster pid)
predict h, p

g h0 = h if ls == 1 & age==43 & sex==1 & isc_3==1 & agree==5 & [...]
g h1 = h if ls == 2 & age==43 & sex==1 & isc_3==1 & agree==5 & [...]
[...]

twoway (connect h0 c , sort msymbol(t) ) (connect h1 c, sort msymbol(o) ) ///
> , title("title") saving(graph1, replace)

This is the ouput:


HTML Code:
. set more off

. logit event ls i.c sex age isc_2 isc_3 agree neuro extra open consc workexp chil linc
> married migration region uerate, or vce(cluster pid)

note: 1.c != 0 predicts failure perfectly
      1.c dropped and 461 obs not used

note: 25.c != 0 predicts failure perfectly
      25.c dropped and 5 obs not used

note: 27.c != 0 predicts success perfectly
      27.c dropped and 1 obs not used

note: 29.c != 0 predicts success perfectly
      29.c dropped and 1 obs not used

note: 30.c != 0 predicts success perfectly
      30.c dropped and 2 obs not used

note: 31.c != 0 predicts success perfectly
      31.c dropped and 1 obs not used

note: 34.c != 0 predicts failure perfectly
      34.c dropped and 1 obs not used

note: 36.c != 0 predicts success perfectly
      36.c dropped and 1 obs not used

note: 37.c != 0 predicts success perfectly
      37.c dropped and 1 obs not used

note: 43.c != 0 predicts success perfectly
      43.c dropped and 1 obs not used

note: 45.c != 0 predicts success perfectly
      45.c dropped and 1 obs not used

note: 42.c omitted because of collinearity
Iteration 0:   log pseudolikelihood = -1827.2122  
Iteration 1:   log pseudolikelihood =  -1515.711  
Iteration 2:   log pseudolikelihood = -1512.9972  
Iteration 3:   log pseudolikelihood = -1512.9923  
Iteration 4:   log pseudolikelihood = -1512.9923  

Logistic regression                             Number of obs     =      2,644
                                                Wald chi2(42)     =     425.49
                                                Prob > chi2       =     0.0000
Log pseudolikelihood = -1512.9923               Pseudo R2         =     0.1720

                                (Std. Err. adjusted for 1,555 clusters in pid)
------------------------------------------------------------------------------
             |               Robust
       event | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          ls |   1.145906   .0277332     5.63   0.000     1.092819    1.201572
             |
           c |
          1  |          1  (empty)
          2  |   1.552097   2.015046     0.34   0.735     .1218503     19.7702
          3  |   1.377861      1.791     0.25   0.805     .1078398    17.60484
          4  |   .8615638   1.119886    -0.11   0.909     .0674327     11.0079
          5  |   .8883096   1.155572    -0.09   0.927     .0693849    11.37271
          6  |   .7417426   .9649531    -0.23   0.818     .0579298    9.497396
          7  |   .7192293   .9394127    -0.25   0.801     .0556008    9.303662
          8  |    .906454   1.188801    -0.07   0.940     .0693438    11.84907
          9  |   .7811753   1.023806    -0.19   0.851     .0598641    10.19367
         10  |   .5211759   .6839178    -0.50   0.619     .0398096    6.823093
         11  |   .6436618   .8522294    -0.33   0.739      .048044    8.623363
         12  |   1.203656    1.59037     0.14   0.888     .0903285    16.03912
         13  |   .8913859   1.193782    -0.09   0.932     .0645804    12.30357
         14  |   .5162619   .6973234    -0.49   0.625     .0365711    7.287892
         15  |   .4079028   .5549721    -0.66   0.510     .0283436    5.870281
         16  |   .4799991   .6424768    -0.55   0.583     .0348266    6.615614
         17  |   1.055463   1.411439     0.04   0.968      .076764    14.51205
         18  |   .8580779   1.180012    -0.11   0.911      .057939    12.70815
         19  |   1.024327    1.40855     0.02   0.986     .0691756    15.16787
         20  |   1.202334   1.739246     0.13   0.899     .0705849    20.48038
         21  |   .3299143    .625881    -0.58   0.559     .0080091    13.58993
         22  |   .1805353   .2847246    -1.09   0.278     .0082058    3.971944
         23  |   1.125653   1.595326     0.08   0.933     .0699913    18.10361
         24  |   .2354943    .377045    -0.90   0.366     .0102127     5.43027
         25  |          1  (empty)
         26  |    6.04836   10.64548     1.02   0.307     .1920752    190.4601
         27  |          1  (empty)
         28  |   1.520192    2.43863     0.26   0.794     .0655293    35.26642
         29  |          1  (empty)
         30  |          1  (empty)
         31  |          1  (empty)
         34  |          1  (empty)
         36  |          1  (empty)
         37  |          1  (empty)
         42  |          1  (omitted)
         43  |          1  (empty)
         45  |          1  (empty)
             |
         sex |   .8228348   .0901818    -1.78   0.075     .6637764    1.020008
         age |   .9536672   .0084241    -5.37   0.000     .9372983    .9703219
       isc_2 |   1.305533   .1609311     2.16   0.031     1.025325    1.662319
       isc_3 |   2.160438   .3524755     4.72   0.000     1.569163    2.974509
       agree |   .9728213   .0514318    -0.52   0.602     .8770638    1.079034
      [...]
       _cons |   .0000106   .0000195    -6.22   0.000     2.85e-07    .0003915
------------------------------------------------------------------------------

. predict h, p
(17506 missing values generated)

. g h0 = h if ls == 1 & age==43 & sex==1 & isc_3==1 & agree==5 & [...]
(20,611 missing values generated)

. g h1 = h if ls == 2 & age==43 & sex==1 & isc_3==1 & agree==5 & [...]
(20,611 missing values generated)


And the graph of course is empty. Which makes sense, since all my cases appear to be deleted when I create "h0" and "h1".


Is there any better way to incorporate the mean? Or is there maybe s.th. wrong with my approach?

Renaming mutiple dummy variables using a loop

$
0
0
Hello,

I have a dataset in which I have created 60 dummy variables (named as dummy1, dummy2, dummy3......and dummy64) using the 'tabulate' command. Now, I would like to rename these variables. For example, I want to rename the dummy1 to Agate; dummy2 to iron; dummy3 to Zinc and so on. How can i do this for all my variables in the simplest form?

Thank you.

Create my own panel data by specifing the underlying DGP

$
0
0
Dear All,
I am trying to generate my own panel data. I know how to create purely cross sectional data:

Code:
clear
set obs 10000
gen x1 = rnormal(1,2)
gen x2= rnormal(0,3)
gen eps = rnormal(0,4)
gen y = 3*x1+2*x2+eps
reg y x1 x2
Basically I create the GDP and can use regression commands to recover the true coeffcients. I would like to do the same thing but with panel data. Googling only led to the sample command which is used to sue a subsample of a given dataset, but I cannot find anything that helps me create (random) data based on a specified GDP.

Can anyone point me to the correct commands?

Thanks in advance!

Best,

Byte variable encoding

$
0
0
Hello!

I'm having an issue with my data. I have a dataset with variables such as "type of labor contract", "job occupation" etc... The observations in these variables are written with letters (long-term contract, short-term contract). They are "byte" and not string though.
I would like to encode the different outcome of these variables with number values. Let's say: permanent labor contract=1, short-term contract=2 etc...

How should I do that?

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(laborcontract occupjob)
. .
1 2
1 4
1 3
2 4
. .
2 6
3 4
. .
2 4
2 2
2 2
. .
. .
2 3
. .
. .
. .
2 2
2 2
2 6
2 6
. .
. .
. .
. .
2 2
. .
1 2
2 6
2 4
2 4
end
label values laborcontract c22
label def c22 1 "permanent", modify
label def c22 2 "long term contract worker (one year and above)", modify
label def c22 3 "short term contract worker (less than one year)", modify
label def c22 4 "non-contract temp", modify
label values occupjob c09
label def c09 1 "Principals in State Agencies, Party organizations, enterprises and public service unit", modify
label def c09 2 "Professional technicians", modify
label def c09 3 "Clerk and relating personnel", modify
label def c09 4 "Commercial and service personnel", modify
label def c09 6 "Manufacturing and transporting equipment manipulator and relating personnel", modify
It looks like it's already encoded but when I look at the data with data browser it's the label value that shows up and I'd like to get rid of that to just have the number value.

numbering trading days

$
0
0
Dear all,

I have daily panel data on the returns for 225 stocks over 3 years. Now I want to generate a variable (day) that numbers my trading days (date) in stata, such that the weekend days are ignored. This is probably a simple problem, but I can't seem to find the answer since I have been using stata for only a few days now. Bellow an illustration of what I want to achieve. Array
Best regards,

Gianni


T-test?

$
0
0
Hello all,

I am hoping to get some thoughts on the appropriate models/tests that make the most sense for an analysis.

I am comparing three networks' coverage of a certain televised event, using data generated by a content analysis. One network is broadcasting the event, while the other two are not. Although there are three networks, our theory does not really have a clear prediction of how the two non-broadcasting network will cover the event relative to one another. Instead, we simply predict that the broadcasting network will cover the event differently than the two non-broadcasting networks. So, to me, that means a chi-squared or ANOVA test is not really appropriate--as that would be testing the prediction that all the three networks' coverage differ from one other, which is not what we predict. This also means that a tau-B test is definitely out, because it tests not only for differences between all three but also a monotonic relationship of the differences.

This suggests to me that what we really want to be running is a t-test in which we compare the broadcasting network's coverage of the event to the two non-broadcasting networks' coverage of the same event (pooled together). If we want, we could also just run simple OLS regressions with the hosting network as the baseline and the two non-hosting networks as dummies.

However, I want to make sure that there is not another model that might test our prediction more rigorously or robustly. Is there another type of test I should look into? I would love to hear any thoughts anyone might have on this, because I want to make sure I am using the "preferred" model for what I am testing.

Thank you!!

Running same Fama&Machbeth annual cross-sectional regression for different sub-samples

$
0
0
First of all this is first time for me to actively participate in this forum. I have checked the archive of the forum, however I could not see any title about my particular problem.

I would like to thank you for your answers in advance.

My problem is that, as you can see from the title, I need to run same Fama&Machbeth annual cross-sectional regression for different sub-samples.

My data-set is composed of British, German and French listed companies but I need to run Fama&Machbeth annual cross-sectional regression for each country seperately.

Here is the code I have;

xtset ID YEAR
set more off
gen a0=.
gen a1=.
gen b1=.
gen b2=.
gen meanCFO=.
gen abnormalCFO=.
forvalues i=2005 (1) 2014{
regress CFOTAL DTAL SALETAL ChnSALETAL if YEAR==`i'
replace a0=_b[_cons] if YEAR==`i'
replace a1=_b[DTAL] if YEAR==`i'
replace b1=_b[SALETAL] if YEAR==`i'
replace b2=_b[ChnSALETAL] if YEAR==`i'
replace meanCFO= a0+a1*DTAL+b1*SALETAL+b2*ChnSALETAL if YEAR==`i'
replace abnormalCFO=CFOTAL-meanCFO if YEAR==`i'
}
fm CFOTAL DTAL SALETAL ChnSALETAL

I tried "bysort:" but it did not work. I don't know in which stage of the code I should add additional command for running the regression for different sub-samples.

Your opinions will help me a lot. Thank you for your answers and for your interest in advance
Viewing all 72802 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>