From randy@stat.tamu.edu Mon Mar 20 14:20:13 1995
>From randy@stat.tamu.edu  Mon Mar 20 14:20:12 1995
Received: from picard.tamu.edu (PICARD.TAMU.EDU [165.91.112.177]) by linc.cis.upenn.edu (8.6.10/UPenn 1.4) with SMTP
	id OAA28364 for <benh@linc.cis.upenn.edu>; Mon, 20 Mar 1995 14:20:10 -0500
Posted-Date: Mon, 20 Mar 1995 14:20:10 -0500
Received-Date: Mon, 20 Mar 1995 14:20:10 -0500
Received: by picard.tamu.edu (AA01797); Mon, 20 Mar 95 13:18:20 CST
Date: Mon, 20 Mar 95 13:18:20 CST
From: randy@PICARD.tamu.edu (Randy Hutson)
Message-Id: <9503201918.AA01797@picard.tamu.edu>
To: benh@linc.cis.upenn.edu
Subject: Re:  Profiling results
Status: R

>It looks to me that "floor_grid()" *might* benefit from being made into
>a macro, but I do not know how to tell the "speedup" of making a function
>into a macro.  I assume you lose all data on the contribution due to that
>function.

That is a problem, but it's still possible to get a good idea if a macro
version is faster. For example, in one profile I did, floor_grid() was
called 4169330 out of 6240772 times from update_view_aux(). These
4169330 calls tooks 4.56 CPU seconds. update_view_aux() was called
2084665 times for a total of 3.73 CPU seconds. Now if floor_grid() were
rewritten as a macro, update_view_aux()'s execution time could be compared
to 4.56 + 3.73 = 8.29 seconds. If it was noticeably less than 8.29 seconds,
this is a good indication that a macro version of floor_grid() is faster.
A similar comparison could be made to process_monsters() which is the
second most frequent caller of floor_grid.


>Unfortunately, there *is* a way to speed up "floor_grid()" a
>*lot* and that is to make *two* versions of it, one which *assumes* that
>the given location is valid.  And then have the code call this "dangerous"
>version when possible.  Note that usually, about half of the function is
>spent calling "in_bounds()".

I noticed that, but I had no idea if that check could possibly be removed.

>I am pretty sure that *some* of the calls to
>"floor_grid()" can be replaced by the "dangerous" version.

I was curious about this, so I added some information gathering code
into the game. I played for 10 minutes or so, and none of the calls to
floor_grid from make_move(), mon_move(), and process_monsters() were
ever illegal. The update_view_aux functions did get a lot of out of
bounds hits though.

>For my edification, could you run tests 1 and 2 with the player doing something
>like "resting" for 1000 turns.

OK, I've done this. You might be able to profile the game and test it
yourself if you have access to a BSDish system (Sun OS included). I'm
using "gprof" which is a standard BSD utility. It's not difficult to
get profiling data from gprof, although assimilating it into a useful
form isn't always trivial. If you can find gprof, just compile and link
the game with the "-pg" switch sent to the compiler (this works with gcc).
Then run the game normally. When the program exits, a file named "gmon.out"
is created. You then run "gprof angband", and it will write a lot of
information to stdout (so redirect it to a file).

>Your data seems to suggest that (at least when
>an orc pit is nearby) the "pre-compute view" code is about as efficient as
>not having it.  I like that... :-)

I'd say that using the pre-compute view option in the dungeon is to
moderately more efficient than not using it. It seems to slow down the
game noticeably in wide open areas such as the town though. (It was 13%
faster when just resting though.)

>Oh, note that I chose "resting" because "update_view()" is only called when
>the player moves, so don't do 100 calls to "stay here" :-)

>As always, thanks a *lot* for your help...

You're welcome! Thanks for spending a lot more time on the game yourself.


OK, here's the profiling results for resting 1000 turns below the orc pit with
the pre-compute option on. As before, I've culled out (most) "system"
functions and am just including those few functions which used 1% or more
of CPU time:

% time  seconds   cumsec    calls   function
 18.27     3.05     3.05     9802   process_monsters
 13.48     2.25     5.30    75019   make_move
  6.59     1.10     6.40   358310   mmove
  6.29     1.05     7.45   544841   floor_grid
  3.54     0.59     9.02    75019   mon_move
  3.00     0.50     9.52    74629   get_moves
  2.40     0.40     9.92    74629   mon_will_run
  2.10     0.35    10.27   126181   random_hack
  2.04     0.34    10.61   158932   player_has_los
  2.04     0.34    10.95    56525   QueueAttrChar
  2.04     0.34    11.29    56514   map_info_aux
  1.74     0.29    11.92    56521   map_info
  1.32     0.22    12.14     1033   select
  1.14     0.19    12.33    56270   lite_spot
  1.08     0.18    12.51    74629   get_moves_aux
  1.08     0.18    12.69    18976   update_mon
  1.02     0.17    12.86    48612   sf_put
  1.02     0.17    13.03     1828   string_make

Total CPU time used by game code: 16.69 seconds


And here's the profile without pre-computing:

% time  seconds   cumsec    calls   function
 17.16     3.29     3.29     9802   process_monsters
 10.90     2.09     5.38    75019   make_move
 10.12     1.94     7.32  1048688   floor_grid
  7.04     1.35     8.67   188550   los
  4.90     0.94    10.58   358310   mmove
  2.87     0.55    11.13    75019   mon_move
  2.45     0.47    11.60    74629   mon_will_run
  2.40     0.46    12.06    74629   get_moves
  2.30     0.44    12.50   158914   player_has_los
  1.98     0.38    12.88   126181   random_hack
  1.98     0.38    13.26    56292   map_info_aux
  1.56     0.30    13.89    56296   map_info
  1.36     0.26    14.15   192940   distance
  1.20     0.23    14.38     1033   select
  1.15     0.22    14.60    18976   update_mon
  1.15     0.22    15.04     1828   string_make

Total CPU time used by game code: 19.17 seconds


Oh, I have no good explanation as to why make_move took 2.25 seconds
with pre-compute and 2.09 seconds without. I ran these tests several
times and make_move() was sometimes faster with pre-computing and sometimes
it was slower. (It shouldn't make any difference!) I'd consider this
an insignificant anomaly perhaps related to timing round off errors.
(There are other anomalies, for example, string_make().) I think the
point is, don't put much faith in the apparent preciseness of gprof's
numbers.

Randy