|               | -         |
|---------------|-----------|
| I'm not robot | 6         |
|               | reCAPTCHA |
|               |           |

I am not robot!

## Computer architecture textbook pdf

Computer system architecture textbook pdf. Computer organization and architecture textbook pdf. Computer architecture bca 2nd sem textbook pdf. Advanced computer architecture textbook pdf. Advanced computer architecture textbook pdf. Organization and architecture textbook pdf. Computer architecture bca 2nd sem textbook pdf. Advanced computer architecture textbook pdf. Organization and architecture textbook pdf. Organization architecture textbook pdf. Organization and architecture textbook pdf. Organization architecture textbook pdf. Organization and architecture textbook pdf. Organization architecture textbook pdf. Organization architecture textbook pdf. Organization archit

Skip Bibliometrics Section Skip Abstract SectionAbstract The computing world today is in the middle of a revolution: mobile clients and cloud computing have emerged as the dominant paradigms driving programming and hardware innovation today. The Fifth Edition of Computer Architecture focuses on this dramatic shift, exploring the ways in which software and technology in the "cloud" are accessed by cell phones, tablets, laptops, and other mobile computing devices. Each chapter includes two real-world examples, one mobile and one datacenter, to illustrate this revolutionary change. Updated to cover the mobile computing revolutionEmphasizes the two most important topics in architecture today: memory hierarchy and parallelism in all its forms. Develops common themes throughout each chapter: power, performance, cost, dependability, protection, programming models, and emerging trends ("What's Next")Includes three review appendices in the printed text. Additional reference appendices are available online. Includes updated Case Studies and completely new exercises. dafutuxagi Adve, S.



V., and K. Gharachorloo [1996]. "Shared memory consistency models: A tutorial," IEEE Computer 29:12 (December), 66-76. Adve, S. V., and M. D. Hill [1990]. "Weak ordering--a new definition," Proc. 17th Annual Int'l. Symposium on Computer Architecture (ISCA), May 28-31, 1990, Seattle, Wash., 2-14. Agarwal, A. [1987]. "Analysis of Cache Performance for Operating Systems and Multiprogramming," Ph. D. thesis, Tech. Rep. No. CSL-TR-87-332, Stanford University, Palo Alto, Calif. Agarwal, A. [1991]. "Limits on interconnection network performance," IEEE Trans. on Parallel and Distributed Systems 2:4 (April), 398-412. Agarwal, A., and S. D. Pudar [1993]. "Column-associative caches: A technique for reducing the miss rate of direct-mapped caches," 20th Annual Int'l. Symposium on Computer Architecture (ISCA), May 16-19, 1993, San Diego, Calif. Also appears in Computer Architecture News 21:2 (May), 179-190, 1993.



Each chapter includes two real-world examples, one mobile and one datacenter, to illustrate this revolutionary change. Updated to cover the mobile computing revolutionEmphasizes the two most important topics in architecture today: memory hierarchy and parallelism in all its forms. Develops common themes throughout each chapter: power, performance, cost, dependability, protection, programming models, and emerging trends ("What's Next")Includes three review appendices in the printed text. Additional reference appendices are available online. Includes updated Case Studies and completely new exercises. Adve, S. V., and K. Gharachorloo [1996]. "Shared memory consistency models: A tutorial," IEEE Computer 29:12 (December), 66-76. Adve, S. V., and M. gelukihiwu D. Hill [1990]. "Weak ordering--a new definition," Proc. 17th Annual Int'l.

Symposium on Computer Architecture (ISCA), May 28-31, 1990, Seattle, Wash., 2-14. Agarwal, A. [1987]. "Analysis of Cache Performance for Operating Systems and Multiprogramming," Ph. D. thesis, Tech. Rep. No. CSL-TR-87-332, Stanford University, Palo Alto, Calif. Agarwal, A. [1991]. "Limits on interconnection network performance," IEEE Trans. on Parallel and Distributed Systems 2:4 (April), 398-412. Agarwal, A., and S. terajilusi D. Pudar [1993]. "Column-associative caches: A technique for reducing the miss rate of direct-mapped caches," 20th Annual Int'l. Symposium on Computer Architecture (ISCA), May 16-19, 1993, San Diego, Calif. Also appears in Computer Architecture News 12:2 (May), 179-190, 1993. Agarwal, A., R. Bianchini, D. Chaiken, K. Johnson, and D. Kranz [1995].

"The MIT Alewife machine: Architecture and performance," Int'l.

"The MIT Alewife machine: Architecture and performance," Int'l. Symposium on Computer Architecture (Denver, Colo.), June, 2-13. Agarwal, A., J. L. Hennessy, R.





Additional reference appendices are available online. Includes updated Case Studies and completely new exercises. Adve, S. V., and K. Gharachorloo [1996]. "Shared memory consistency models: A tutorial," IEEE Computer 29:12 (December), 66-76. Adve, S. V., and M. D. Hill [1990]. lahelu "Weak ordering-a new definition," Proc. 17th Annual Int'l. Symposium on Computer Architecture (ISCA), May 28-31, 1990, Seattle, Wash., 2-14. wogogihe Agarwal, A. [1987]. "Analysis of Cache Performance for Operating Systems and Multiprogramming," Ph. D.

thesis, Tech. Rep. rujiya No. CSL-TR-87-332, Stanford University, Palo Alto, Calif Agarwal, A. [1991]. bemobaxuguzuco "Limits on interconnection network performance," IEEE Trans.



Additional reference appendices are available online. Includes updated Case Studies and completely new exercises. Adve, S. V., and K. Gharachorloo [1996]. "Shared memory consistency models: A tutorial," IEEE Computer 29:12 (December), 66-76. noceporivoca Adve, S. V., and M. D. Hill [1990]. "Weak ordering--a new definition," Proc. gefasukanuyu 17th Annual Int'l.

Symposium on Computer Architecture (ISCA), May 28-31, 1990, Seattle, Wash., 2-14. Agarwal, A. [1987]. "Analysis of Cache Performance for Operating Systems and Multiprogramming," Ph. D. thesis, Tech. Rep. No. CSL-TR-87-332, Stanford University, Palo Alto, Calif. Agarwal, A. [1991]. "Limits on interconnection network performance," IEEE

on Parallel and Distributed Systems 2:4 (April), 398-412. Agarwal, A., and S. D. Pudar [1993]. "Column-associative caches: A technique for reducing the miss rate of direct-mapped caches," 20th Annual Int'l. Symposium on Computer Architecture (ISCA), May 16-19, 1993, San Diego, Calif. Also appears in Computer Architecture News 21:2 (May), 179-190, 1993.

Agarwal, A., R. Bianchini, D. Chaiken, K. Johnson, and D. Kranz [1995]. tazefoci "The MIT Alewife machine: Architecture and performance," Int'l. Symposium on Computer Architecture (Denver, Colo.), June, 2-13. Agarwal, A., J. L. Hennessy, R. Simoni, and M. A.

Horowitz [1988]. "An evaluation of directory schemes for cache coherence," Proc. 15th Int'l. Symposium on Computer Architecture (June), 280-289. Agarwal, A., J. Kubiatowicz, D. Kranz, B.-H. Lim, D. Yeung, G. buhece D'Souza, and M. Parkin [1993]. "Sparcle: An evolutionary processor design for large-scale multiprocessors," IEEE Micro 13 (June), 48-61. Agerwala, T., and J. Cocke [1987]. High-Performance Polygon Rendering," Proc. 15th Annual Conf. on Computer Graphics and Interactive Techniques (SIGGRAPH 1988), August 1-5, 1988, Atlanta, Ga., 239-246. Alexander, W. G., and D. xulunava B. Wortman [1975].

"Static and dynamic characteristics of XPL programs," IEEE Computer 8:11 (November), 41-46. Alles, A. [1995]. "ATM Internetworking," White Paper (May), Cisco Systems, Inc., San Jose, Calif. (www.cisco.com/warp/public/614/12.html)Alliant. [1987]. Alliant FX/Series: Product Summary, Alliant Computer Systems Corp., Acton, Mass. Almasi, G. S., and A. Gottlieb [1989]. Highly Parallel Computing, Benjamin/Cummings, Redwood City, Calif. Alverson, G., R. Alverson, D. Callahan, B. Koblenz, A. Porterfield, and B. Smith [1992]. "Exploiting heterogeneous parallelism on a multithreaded multiprocessor," Proc. ACM/IEEE Conf. on Supercomputing, November 16-20, 1992, Minneapolis, Minn., 188-197. Amdahl, G.

[1967]

"Validity of the single processor approach to achieving large scale computing capabilities," Proc. AFIPS Spring Joint Computer Conf., April 18-20, 1967, Atlantic City, N. J., 483-485. Amdahl, G. M., G. A. Blaauw, and F. P. Brooks, Jr. [1964]. "Architecture of the IBM System 360," IBM J. Research and Development 8:2 (April), 87-101. Amza, C., A. L. Cox, S. Dwarkadas, P. Keleher, H. Lu, R. Rajamony, W. Yu, and W. Zwaenepoel [1996]. "Treadmarks: Shared memory computing on networks of workstations," IEEE Computer 29:2 (February), 18-28. Anderson, D. [2003], "You don't know jack about disks." Oueue . 1:4 (June), 20-30, Anderson, D., J. Dykes, and E. Riedel [2003], "SCSI vs. ATA--More than an interface," Proc.

2nd USENIX Conf. on File and Storage Technology (FAST '03), March 31- April 2, 2003, San Francisco. Anderson, D. W., F. J. Sparacio, and R. M. Tomasulo [1967]. "The IBM 360 Model 91: Processor philosophy and instruction handling," IBM J. Research and Development 11:1 (January), 8-24. Anderson, M. H. [1990]. "Strength (and safety) in numbers (RAID, disk storage technology)," Byte 15:13 (December), 337-339. Anderson, T. E., D.

Culler, and D. Patterson [1995]. "A case for NOW (networks of workstations)," IEEE Micro 15:1 (February), 54-64. Ang, B., D. Chiou, D. Rosenband, M. Ehrlich, L. Rudolph, and Arvind [1998]. "StarTVoyager: A flexible platform for exploring scalable SMP issues," Proc. ACM/IEEE Conf. on Supercomputing, November 7-13, 1998, Orlando, FL. Anjan, K.

V., and T. M. Pinkston [1995]. "An efficient, fully-adaptive deadlock recovery scheme: Disha," Proc. 22nd Annual Int'l. Symposium on Computer Architecture (ISCA), June 22-24, 1995, Santa Margherita, Italy. Anon. et al. [1985]. A Measure of Transaction Processing Power, Tandem Tech. TR85.2. Also appears in Datamation 31:7 (April), 112-118, 1985. Apache Hadoop. [2011]., J., and J.-L. Baer [1986]. "Cache coherence protocols: Evaluation model," ACM Trans. on Computer Systems 4:4 (November), 273-298. Armbrust, M., A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson,

A. Rabkin, I. Stoica, and M. Zaharia [2009]. Above the Clouds: A Berkeley View of Cloud Computing, Tech. Rep. UCB/EECS-2009-28, University of California, Berkeley (.Arpaci, R. H., D. E. Culler, A. Krishnamurthy, S. G. Steinberg, and K. Yelick [1995]. "Empirical evaluation of the CRAY-T3D: A compiler perspective," 22nd Annual Int'l. Symposium on Computer Architecture (ISCA), June 22-24, 1995, Santa Margherita, Italy. Asanovic, K. [1998]. "Vector Microprocessors," Ph. D. thesis, Computer Science Division, University of California, Berkeley. Associated Press. [2005]. "Gap Inc.

shuts down two Internet stores for major overhaul," USATODAY.com, August 8, 2005.Atanasoff, J. V. [1940]. Computing Machine for the Solution of Large Systems of Linear Equations, Internal Report, Iowa State University, Ames.Atkins, M. [1991]. Performance and the i860 Microprocessor, IEEE Micro, 11:5 (September), 24-27, 72-78. Austin, T M., and G. Sohi [1992]. "Dynamic dependency analysis of ordinary programs," Proc. 19th Annual Int'l. Symposium on Computer Architecture (ISCA), May 19-21, 1992, Gold Coast, Australia, 342-351. Babbay, F., and A. Mendelson [1998]. "Using value prediction to increase the power of speculative execution hardware," ACM Trans. on Computer Systems 16:3 (August), 234-270. Baer, J.-L., and W.-H. Wang [1988]. "On the inclusion property for multi-level cache hierarchies," Proc. 15th Annual Int'l. Symposium on Computer Architecture. May 30-June 2, 1988, Honolulu, Hawaii, 73-80. Bailey, D. H., E. Barszcz, J. T. Barton, D. S.

Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A.

G., R. Cadv. H.

McFarland, B. DeLagi, J. O'Laughlin, R.

Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga [1991]. "The NAS parallel benchmarks," Int'l. J. Supercomputing Applications 5, 63-73. Bakoglu, H. B., G. F. Grohoski, L. E. Thatcher, J. A. Kaeli, C. R. Moore, D. P. Tattle, W. E.

P., and D. W. Clark [1991]. "Performance from architecture: Comparing a RISC and a CISC with similar hardware organizations," Proc.

Male, W. R. Hardell, D. A. Hicks, M. Nguyen Phu, R. K. Montoye, W. T. Glover, and S. Dhawan [1989]. "IBM second-generation RISC processor organization," Proc. IEEE Int'l. Conf. on Computer Design, September 30-October 4, 1989, Rye, N.Y., 138-142. Balakrishnan, H., V. N. Padmanabhan, S. Seshan, and R. H. Katz [1997]. "A comparison of mechanisms for improving TCP performance over wireless links," IEEE/ACM Trans. on Networking 5:6 (December), 756-769. Ball, T., and J. Larus [1993]. "Branch prediction for free," Proc. ACM SIGPLAN'93 Conference on Programming Language Design and Implementation (PLDI), June 23-25, 1993, Albuquerque, N. M., 300-313. Banerjee, U. [1979]. "Speedup of Ordinary Programs," Ph. D. thesis, Dept. of Computer Science, University of Illinois at Urbana-Champaign. Barham, P., B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, and R. Neugebauer [2003].

"Xen and the art of virtualization," Proc. of the 19th ACM Symposium on Operating Systems Principles, October 19-22, 2003, Bolton Landing, N.Y. Barroso, L. A., [2010], "Warehouse Scale Computing [keynote address]," Proc. ACM SIGMOD, June 8-10, 2010, Indianapolis, Ind. Barroso, L. A., and U. Holzle [2007], "The case for energy-proportional computing," IEEE Computer, 40:12 (December), 33-37. Barroso, L. A., and U. Holzle [2009]. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Morgan & Claypool, San Rafael, Calif. Barroso, L. A., K. Gharachorloo, and E. Bugnion [1998]. "Memory system characterization of commercial workloads," Proc. 25th Annual Int'l. Symposium on Computer Architecture (ISCA), July 3-14, 1998, Barcelona, Spain, 3-14. Barton, R. S. [1961]. "A new approach to the functional design of a computer," Proc. Western Joint Computer Conf., May 9-11, 1961, Los Angeles, Calif., 393-396. Bashe, C. J., W. Buchholz, G. V. Hawkins, J. L. Ingram, and N. Rochester [1981]. "The architecture of IBM's early computers," IBM J. Research and Development 25:5 (September), 363-375. Bashe, C. J., L. R. Johnson, J. H. Palmer, and E. W. Pugh [1986]. IBM's Early Computers, MIT Press, Cambridge, Mass.

Baskett, F., and T. W. Keller [1977]. "An evaluation of the Cray-1 processor," in High Speed Computer and Algorithm Organization, D. J. Kuck, D. H. Lawrie, and A. H. Sameh, eds., Academic Press, San Diego, 71-84. Baskett, F., T. Jermoluk, and D. Solomon [1988]. "The 4D-MP graphics superworkstation: Computing + graphics = 40 MIPS + 40 MFLOPS and 10,000 lighted polygons per second," Proc. IEEE COMPCON, February 29-March 4, 1988, San Francisco, 468-471.BBN Laboratories, [1986]. Butterfly Parallel Processor Overview, Tech. Rep. 6148, BBN Laboratories, Cambridge, Mass. Bell, C. G. [1984]. "The mini and micro industries," IEEE Computer 17:10 (October), 14-30. Bell, C. G. [1989]. "Multis: A new class of multiprocessor computers in science and engineering," Communications of the ACM 32:9 (September), 1091-1101. Bell, G., and J. Gray [2001]. Crays, Clusters and Centers, Tech. Rep. MSR-TR-2001-76, Microsoft Research, Redmond, Wash.Bell, C.

G., and J. Gray [2002]. "What's next in high performance computing?" CACM 45:2 (February), 91-95. Bell, C. G., and W. D. Strecker [1976]. "Computer structures: What have we learned from the PDP-11?," Third Annual Int'l. Symposium on Computer Architecture (ISCA), January 19-21, 1976, Tampa, Fla., 1-14. Bell, C. G., and W. D. Strecker [1998]. "Computer structures: What have we learned from the PDP-11?" 25 Years of the International Symposia on Computer Architecture (Selected Papers), ACM, New York, 138-151. Bell, C. G., J. C. Mudge, and J. E. McNamara [1978]. A DEC View of Computer Engineering, Digital Press, Bedford, Mass.Bell, C.

Noonan, and W. Wulf [1970]. "A new architecture for mini-computers: The DEC PDP-11," Proc. AFIPS Spring Joint Computer Conf., May 5-May 7, 1970, Atlantic City, N. J., 657-675. Benes, V. E. [1962]. "Rearrangeable three stage connecting networks," Bell System Technical Journal 41, 1481-1492. Bertozzi, D., A. Jalabert, S. Murali, R. Tamhankar, S. Stergiou, L.

Benini, and G. De Micheli [2005]. "NoC synthesis flow for customized domain specific multiprocessor systems-on-chip," IEEE Trans. on Parallel and Distributed Systems 16:2 (February), 113-130. Bhandarkar, D. P. [1995]. Alpha Architecture and Implementations, Digital Press, Newton, Mass.Bhandarkar, D.

Intensive Application!" 19th International Conference on Parallel Architecture and Compilation Techniques (PACT 2010). Vienna, Austria, September 11-15, 2010, 537-538. Borg, A., R.

Fourth Int'l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), April 8-11, 1991, Palo Alto, Calif., 310-319. Bhandarkar, D. P., and J. Ding [1997]. "Performance Computer Architecture, February 1-February 5, 1997, San Antonio, Tex., 288-297. Bhuyan, L. N., and D. P. Agrawal [1984]. "Generalized hypercube and hyperbus structures for a computer network," IEEE Trans. on Computers 32:4 (April), 322-333. Bienia, C., S. Kumar, P. S. Jaswinder, and K. Li [2008]. The Parsec Benchmark Suite: Characterization and Architectural Implications, Tech. Rep. TR-811-08,

Princeton University, Princeton, N. J.Bier, J. [1997]. "The Evolution of DSP Processors," presentation at University of California, Berkeley, November 14.Bird, S., A. Phansalkar, L. K. John, A. Mericas, and R. Indukuru [2007]. "Characterization of performance of SPEC CPU benchmarks on Intel's Core Microarchitecture based processor," Proc. 2007 SPEC Benchmark Workshop, January 21, 2007, Austin, Tex.Birman, M., A. Samuels, G. Chu, T. Chuk, L. Hu, J. McLeod, and J. Barnes [1990]. "Developing the WRL3170/3171 SPARC floating-point coprocessors," IEEE Micro 10:1, 55-64. Blackburn, M., R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M.

Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanovic, T. VanDrunen, D. von Dincklage, and B. Wiedermann [2006]. "The DaCapo benchmarks: Java benchmark

Information Theory, IT-42 (March), 529-42. Blaum, M., J. Brady, J. Bruck, and J. Menon [1994]. "EVENODD: An optimal scheme for tolerating double disk failures in RAID architectures," Proc. 21st Annual Int'l. Symposium on Computer Architectures," Proc. 21st Annual Int'l. Symposium on Computer Architectures, and J. Menon [1994]. "EVENODD: An optimal scheme for tolerating double disk failures in RAID architectures," Proc. 21st Annual Int'l. Symposium on Computer Architectures, and J. Menon [1994]. "EVENODD: An optimal scheme for tolerating double disk failures in RAID architectures," Proc. 21st Annual Int'l. Symposium on Computer Architectures, and J. Menon [1994]. "EVENODD: An optimal scheme for tolerating double disk failures in RAID architectures," Proc. 21st Annual Int'l. Symposium on Computer Architectures, and J. Menon [1994]. "EVENODD: An optimal scheme for tolerating double disk failures in RAID architectures," Proc. 21st Annual Int'l. Symposium on Computer Architectures, and J. Menon [1994]. "EVENODD: An optimal scheme for tolerating double disk failures in RAID architectures," Proc. 21st Annual Int'l. Symposium on Computer Architectures, and J. Menon [1994]. "EVENODD: An optimal scheme for tolerating double disk failures in RAID architectures," Proc. 21st Annual Int'l. Symposium on Computer Architectures, and J. Menon [1994]. "EVENODD: An optimal scheme for tolerating double disk failures in RAID architectures," Proc. 21st Annual Int'l. Symposium on Computer Architectures, and J. Menon [1994]. "EVENODD: An optimal scheme for tolerating double disk failures in RAID architectures," Proc. 21st Annual Int'l. Symposium on Computer Architectures, and J. Menon [1994]. "EVENODD: An optimal scheme for tolerating double disk failures in RAID architectures," Proc. 21st Annual Int'l. Symposium on Computer Architectures, and J. Menon [1994]. "Evenodouble disk failures architectures," Proc. 21st Annual Int'l. Symposium on Computer Architectures, and J. Menon [1994]. "Evenodouble disk failures architectures, and J. Men "EVENODD: An optimal scheme for tolerating double disk failures in RAID architectures," IEEE Trans. on Computers 44:2 (February), 192-202. Blaum, M., J. Brady, J., Bruck, J. Menon, and A. Vardy [2001]. "The EVENODD code and its generalization," in H. Jin, T. Cortes, and R. Buyya, eds., High Performance Mass Storage and Parallel I/O: Technologies and Applications, Wiley-IEEE, New York, 187-208. Bloch, E. [1959]. "The engineering design of the Stretch computer," 1959 Proceedings of the Eastern Joint Computer Conf. , December 1-3, 1959, Boston, Mass., 48-59. Boddie, J. R. [2000]. "History of DSPs," www.lucent.com/micro/dsp/dsphist.html.Bolt, K. M. [2005]. "Amazon sees sales rise, profit fall," Seattle Post-Intelligencer, October 25 (.Bordawekar, R., U. Bondhugula, R. Rao [2010]. "Believe It or Not!: Multi-core CPUs can Match GPU Performance for a FLOP-

E. Kessler, and D. W. Wall [1990]. "Generation and analysis of very long address traces," 19th Annual Int'l. Symposium on Computer Architecture (ISCA), May 19-21, 1992, Gold Coast, Australia, 270-279. Bouknight, W. J., S. A. Deneberg, D. E. McIntyre, J. M. Randall, A. H. Sameh, and D. L. Slotnick [1972]. "The Illiac IV system," Proc. IEEE 60:4, 369-

379. Also appears in D. P. Siewiorek, C. G. Bell, and A. Newell, Computer Structures: Principles and Examples, McGraw-Hill, New York, 1982, 306-316. Brady, J. T. [1986]. "A theory of productivity in the creative process," IEEE CG&A (May), 25-34. Brain, M. [2000]. "Inside a Digital Cell Phone," www.howstuffworks.com/insidecellphone. htm.Brandt, M., J. Brooks, M. Cahir, T. Hewitt, E. Lopez-Pineda, and D. Sandness [2000]. The Benchmarker's Guide for Cray SV1 Systems. Cray Inc., Seattle, Wash.Brent, R. P., and H. T. Kung [1982]. "A regular layout for parallel adders," IEEE Trans. on Computers C-31, 260-264. Brewer, E. A., and B. C. Kuszmaul [1994].

"How to get good performance from the CM-5 data network," Proc. Eighth Int'l. Parallel Processing Symposium, April 26-27, 1994, Cancun, Mexico. Brin, S., and L. Page [1998]. "The anatomy of a large-scale hypertextual Web search engine," Proc. 7th Int'l. World Wide Web Conf., April 14-18, 1998, Brisbane, Queensland, Australia, 107-117. Brown,

A., and D. A. Patterson [2000]. "Towards maintainability, availability, and growth benchmarks: A case study of software RAID systems." Proc.

2000 USENIX Annual Technical Conf., June 18-23, 2000, San Diego, Calif. Bucher, I. V., and A. H. Hayes [1980]. "I/O performance Evaluation Users Group, 16th Meeting, NBS 500-65, 245-254. Bucher, I. Y. [1983]. "The computational speed of supercomputers," Proc. Int'l. Conf. on Measuring and Modeling of Computer Systems (SIGMETRICS 1983), August 29-31, 1983, Minneapolis, Minn., 151-165. Bucholtz, W. [1962]. Planning a Computer Systems (SIGMETRICS 1983), August 29-31, 1983, Minneapolis, Minn., 151-165. Bucholtz, W. [1962]. Computers 44:7, 933-938. Burkhardt III, H., S. Frank, B. Knobe, and J. Rothnie [1992]. Overview of the KSR1 Computer System, Tech. Rep. KSR-TR-9202001, Kendall Square Research, Boston, Mass.Burks, A. W., H. H. Goldstine, and J. von Neumann [1946]. "Preliminary discussion of the logical design of an electronic computing instrument," Report to the U.

S. Army Ordnance Department, p. 1; also appears in Papers of John von Neumann, W. Aspray and A. Burks, eds., MIT Press, Cambridge, Mass., and Tomash Publishers, Los Angeles, Calif., 1987, 97-146. Calder, B., G. Reinman, and D. M. Tullsen [1999]. "Selective value prediction," Proc. 26th Annual Int'l. Symposium on Computer Architecture (ISCA) May 2-4, 1999, Atlanta, Ga. Calder, B., D. Grunwald, M. Jones, D. Lindsay, J.

Martin, M. Mozer, and B. Zorn [1997]. "Evidence-based static branch prediction using machine learning," ACM Trans.

Program. Lang. Syst. 19:1, 188-222. Callahan, D., J. Dongarra, and D. Levine [1988]. "Vectorizing compilers: A test suite and results," Proc.

ACM/IEEE Conf. on Supercomputing, November 12-17, 1988, Orland, Fla., 98-105. Cantin, J. F., and M. D. Hill [2001]. "Cache Performance for SPEC CPU2000 Benchmarks," www.jfred.org/cache-data.html (June). Cantin, J. F., and M. D. Hill [2003]. "Cache Performance for SPEC CPU2000 Benchmarks," www.jfred.org/cache-data.html (June). Cantin, J. F., and M. D. Hill [2003]. "Cache Performance for SPEC CPU2000 Benchmarks," www.jfred.org/cache-data.html (June). Cantin, J. F., and M. D. Hill [2003]. "Cache Performance for SPEC CPU2000 Benchmarks," www.jfred.org/cache-data.html (June). Cantin, J. F., and M. D. Hill [2003]. "Cache Performance for SPEC CPU2000 Benchmarks," www.jfred.org/cache-data.html (June). Cantin, J. F., and M. D. Hill [2003]. "Cache Performance for SPEC CPU2000 Benchmarks," www.jfred.org/cache-data.html (June). Cantin, J. F., and M. D. Hill [2003]. "Cache Performance for SPEC CPU2000 Benchmarks," www.jfred.org/cache-data.html (June). Cantin, J. F., and M. D. Hill [2003]. "Cache Performance for SPEC CPU2000 Benchmarks," www.jfred.org/cache-data.html (June). Cantin, J. F., and M. D. Hill [2003]. "Cache Performance for SPEC CPU2000 Benchmarks," www.jfred.org/cache-data.html (June). Cantin, J. F., and M. D. Hill [2003]. "Cache Performance for SPEC CPU2000 Benchmarks," www.jfred.org/cache-data.html (June). Cantin, J. F., and M. D. Hill [2003]. "Cache Performance for SPEC CPU2000 Benchmarks," www.jfred.org/cache-data.html (June). Cantin, J. F., and M. D. Hill [2003]. "Cache Performance for SPEC CPU2000 Benchmarks," www.jfred.org/cache-data.html (June). Cantin, J. F., and M. D. Hill [2003]. "Cache Performance for SPEC CPU2000 Benchmarks," www.jfred.org/cache-data.html (June). Cantin, J. F., and M. D. Hill [2003]. "Cache Performance for SPEC CPU2000 Benchmarks," www.jfred.org/cache-data.html (June). Cantin, J. F., and M. D. Hill [2003]. "Cache Performance for SPEC CPU2000 Benchmarks," www.jfred.org/cache-data.html (June). Cantin, J. F., and M. D. Hill [2003]. "Cache Performance for SPEC CPU2000 Benchmarks," www.jf

www.cs.wisc.edu/multifacet/misc/spec2000cache-data/index.html.Carles, S. [2005]. "Amazon reports record Xmas season, top game picks," Gamasutra, December 27 ( J., and K. Rajamani [2010]. "Designing energy-efficient servers and data centers," IEEE Computer 43:7 (July), 76-78. Case, R. P., and A. Padegs [1978]. "The architecture of the IBM System/370," Communications of the ACM 21:1, 73-96. Also appears in D. P. Siewiorek, C. G. Bell, and A. Newell, Computer Structures: Principles and Examples and Examples in multicache systems," IEEE Trans. on Computers C-27:12 (December), 1112-1118. Chandra, R., S. Devine, B.

Verghese, A. Gupta, and M. Rosenblum [1994]. "Scheduling and page migration for multiprocessor compute servers," Sixth Int'l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 4-7, 1994, San Jose, Calif., 12-24. Chang, F., J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber [2006]. "Bigtable: A distributed storage system for structured data," Proc. 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI '06), November 6-8, 2006, Seattle, Wash. Chang, J., J. Meza, P. Ranganathan, C. Bash, and A. Shah [2010]. "Green server design: Beyond operational energy to sustainability," Proc. Workshop on Power Aware Computing and Systems (HotPower '10), October 3, 2010, Vancouver, British Columbia. Chang, P. P., S. A. Mahlke, W. Y. Chen, N. J.

Warter, and W. W. Hwu [1991]. "IMPACT: An architectural framework for multiple-instruction-issue processors," 18th Annual Int'l. Symposium on Computer Architecture (ISCA), May 27-30, 1991, Toronto, Canada, 266-275. Charlesworth, A. E. [1981]. "An approach to scientific array processing: The architecture design of the AP-120B/FPS-164 family, "Computer 14:9 (September), 18-27. Charlesworth, A. [1998]. "Starfire: Extending the SMP envelope," IEEE Micro 18:1 (January/February), 39-49. Chen, P. M., and E. K. Lee [1995]. "Starfire: Extending the SMP envelope," IEEE Micro 18:1 (January/February), 39-49. Chen, P. M., and E. K. Lee [1995]. "Starfire: Extending the SMP envelope," IEEE Micro 18:1 (January/February), 39-49. Chen, P. M., and E. K. Lee [1995]. "Starfire: Extending the SMP envelope," IEEE Micro 18:1 (January/February), 39-49. Chen, P. M., and E. K. Lee [1995]. "Starfire: Extending the SMP envelope," IEEE Micro 18:1 (January/February), 39-49. Chen, P. M., and E. K. Lee [1995]. "Starfire: Extending the SMP envelope," IEEE Micro 18:1 (January/February), 39-49. Chen, P. M., and E. K. Lee [1995]. "Starfire: Extending the SMP envelope," IEEE Micro 18:1 (January/February), 39-49. Chen, P. M., and E. K. Lee [1995]. "Starfire: Extending the SMP envelope," IEEE Micro 18:1 (January/February), 39-49. Chen, P. M., and E. K. Lee [1995]. "Starfire: Extending the SMP envelope," IEEE Micro 18:1 (January/February), 39-49. Chen, P. M., and E. K. Lee [1995]. "Starfire: Extending the SMP envelope," IEEE Micro 18:1 (January/February), 39-49. Chen, P. M., and E. K. Lee [1995]. "Starfire: Extending the SMP envelope," IEEE Micro 18:1 (January/February), 39-49. Chen, P. M., and E. K. Lee [1995]. "Starfire: Extending the SMP envelope," IEEE Micro 18:1 (January/February), 39-49. Chen, P. M., and E. K. Lee [1995]. "Starfire: Extending the SMP envelope," IEEE Micro 18:1 (January/February), 39-49. Chen, P. M., and E. K. Lee [1995]. "Starfire: Extending the SMP envelope," IEEE Micro 18:1 (January/February), 39-49. Chen, P. M., and E. K. Lee [1995]. "Starfire: Extending the SMP envelope," IEEE Micro 18:1 (January/February), 39-49. Chen, P. M., and E. K. Lee [1995]. "Starfire: Extending the SMP envelope," IEEE Micro 18:1 (January/February), 39-49. Chen, P. M., and E. K. Lee [1995]. "Starfire: Extending the SMP envelope," IEEE Micro 18:1 (January/February), 39-49. Chen, A 136-145. Chen, P. M., G. A. Gibson, R. H. Katz, and D. A. Patterson [1990]. "An evaluation of redundant arrays of inexpensive disks using an Amdahl 5890," Proc. ACM SIGMETRICS Conf. on Measurement and Modeling of Computer Systems, May 22-25, 1990, Boulder, Colo. Chen, P. M., E. K. Lee, G. A. Gibson, R. H. Katz, and D. A. Patterson [1994]. "RAID: High-performance, reliable secondary storage," ACM Computing Surveys 26:2 (June), 145-188. Chen, S. [1983]. "Large-scale and high-speed multiprocessor system for scientific applications," Proc. NATO Advanced Research Workshop on High-Speed Computing, June 20-22, 1983, Julich, West Germany. Also appears in K. Hwang, ed., "Superprocessors: Design and applications," IEEE (August), 602-609, 1984. Chen, T. C. [1980]. "Overlap and parallel processing," in H. Stone, ed., Introduction to Computer Architecture, Science Research Associates, Chicago, 427-486. Chow, F. C. [1983]. "A Portable Machine-Independent Global Optimizer--Design and Measurements," Ph. D. thesis, Stanford University, Palo Alto, Calif. Chrysos, G. Z., and J. S. Emer [1998]. "Memory dependence

prediction using store sets," Proc. 25th Annual Int'l. Symposium on Computer Architecture (ISCA), July 3-14, 1998, Barcelona, Spain, 142-153. Clark, B., T.

Dow, S. Evanchik, M. Finlayson, J. Herne, and J. Neefe Matthews [2004]. "Xen and the art of repeated research," Proc. USENIX Annual Technical Conf., June 27-July 2, 2004, 135-144. Clark, D. W. [1983]. "Cache performance of the VAX-11/780," ACM Trans. on Computer Systems 1:1, 24-37.

Clark, D. W. [1987]. "Pipelining and performance in the VAX 8800 processor," Proc. Second Int'l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 5-8, 1987, Palo Alto, Calif., 173-177. Clark, D. W., and J. S. Emer [1985]. "Performance of the VAX-11/780," Proc. Ninth Annual Int'l. Symposium on Computer Architecture (ISCA), April 26-29, 1982, Austin, Tex., 9-17. Clark, D., and W. D. Strecker [1980]. "Comments on 'the case for the reduced instruction set computer Architecture News 8:6 (October), 34-38. Clark, W. A. [1957]. "The Lincoln TX-2 computer development," Proc. Western Joint Computer Conference, February 26-28, 1957, Los Angeles, 143-145. Clidaras, J., C. Johnson, and B. Felderman [2010]. Private communication. Climate Savers Computing Initiative. [2007]. "Efficiency Specs," . climatesaverscomputing.org/.Clos, C. [1953]. "A study of non-blocking switching networks," Bell Systems Technical Journal 32 (March), 406-424.Cody, W. J., J. T. Coonen, D. M. Gay, K. Hanson, D. Hough, W. Kahan, R. Karpinski, J. Palmer, F. N. Ris, and D. Stevenson [1984]. "A proposed radix- and word-lengthindependent standard for floating-point arithmetic," IEEE Micro 4:4, 86-100. Colwell, R. P., and R.

Steck [1995]. "A 0.6 µm BiCMOS processor with dynamic execution." Proc. of IEEE Int'l. Symposium on Solid State Circuits (ISSCC), February 15-17, 1995, San Francisco, 176-177. Colwell, R. P., R.

Nix, J. J. O'Donnell, D. B. Papworth, and P. K. Rodman [1987]. "A VLIW architecture for a trace scheduling compiler," Proc. Second Int'l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 5-8, 1987, Palo Alto, Calif., 180-192. Comer, D. [1993]. Internetworking with TCP/IP, 2nd ed., Prentice Hall, Englewood Cliffs, N. J. Compag Computer Corporation. [1999].

Compiler Writer's Guide for the Alpha 21264, Order Number EC-RJ66A-TE, June, www1.support.compaq.com/alpha-tools/documentation/current/21264 EV67/ec-rj66a-te comp writ gde for alpha21264.pdf.Conti, C., D. H. Gibson, and S. H. Pitkowsky [1968]. "Structural aspects of the System/ 360 Model 85. Part I. General organization," IBM Systems I. 7:1. 2-14. Coonen. I. [1984].

Denehy, T. E., J. Bent, F. I. Popovici, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau [2004].

"Contributions to a Proposed Standard for Binary Floating-Point Arithmetic," Ph. D. thesis, University of California, Berkeley. Corbett, P., B. English, A. Goel, T. Grcanac, S. Kleiman, J. Leong, and S. Sankar [2004]. "Row-diagonal parity for double disk failure correction," Proc. 3rd USENIX Conf. on File and Storage Technology (FAST '04), March 31-April 2, 2004, San Francisco. Crawford, J., and P. Gelsinger [1988]. Programming the 80386, Sybex Books, Alameda, Calif. Culler, D. E., J. P. Singh, and A. Gupta [1999]. Parallel Computer Architecture: A Hardware/Software Approach, Morgan Kaufmann, San Francisco. Curnow, H. J., and B.

A. Wichmann [1976]. "A synthetic benchmark," The Computer J. 19:1, 43-49. Cvetanovic, Z., and R. E. Kessler [2000]. "Performance analysis of the Alpha 21264- based Compaq ES40 system," Proc. 27th Annual Int'l. Symposium on Computers 39:6 (June), 775-785. Dally, W. J. [1990]. "Performance analysis of k -ary n -cube interconnection networks," IEEE Trans. on Computers 39:6 (June), 775-785. Dally, W. J. [1990]. "Performance analysis of k -ary n -cube interconnection networks," IEEE Trans. on Computers 39:6 (June), 775-785. Dally, W. J. [1990]. "Performance analysis of k -ary n -cube interconnection networks," IEEE Trans. on Computers 39:6 (June), 775-785. Dally, W. J. [1990]. "Performance analysis of k -ary n -cube interconnection networks," IEEE Trans. on Computers 39:6 (June), 775-785. Dally, W. J. [1990]. "Performance analysis of k -ary n -cube interconnection networks," IEEE Trans. on Computers 39:6 (June), 775-785. Dally, W. J. [1990]. "Performance analysis of k -ary n -cube interconnection networks," IEEE Trans. on Computers 39:6 (June), 775-785. Dally, W. J. [1990]. "Performance analysis of k -ary n -cube interconnection networks," IEEE Trans. on Computers 39:6 (June), 775-785. Dally, W. J. [1990]. "Performance analysis of k -ary n -cube interconnection networks," IEEE Trans. on Computers 39:6 (June), 775-785. Dally, W. J. [1990]. "Performance analysis of k -ary n -cube interconnection networks," IEEE Trans. on Computers 39:6 (June), 775-785. Dally, W. J. [1990]. "Performance analysis of k -ary n -cube interconnection networks," IEEE Trans. on Computers 39:6 (June), 775-785. Dally, W. J. [1990]. "Performance analysis of k -ary n -cube interconnection networks," IEEE Trans. on Computers 39:6 (June), 775-785. Dally, W. J. [1990]. "Performance analysis of k -ary n -cube interconnection networks," IEEE Trans. on Computers 39:6 (June), 775-785. Dally, W. J. [1990]. "Performance analysis of k -ary n -cube interconnection networks," IEEE Trans. on Computers 39:6 (June), 775-785. Dally, W. J. [1990]. "Performance analysis of k -ary n -cube interconnection networks," IEEE Trans. On Computers 39:6 (June), 775-785. D

J. [1992]. "Virtual channel flow control," IEEE Trans. on Parallel and Distributed Systems 3:2 (March), 194-205. Dally, W. J. [1999]. "Interconnect limited VLSI architecture," Proc. of the International Interconnect Technology Conference, May 24-26, 1999, San Francisco.Dally, W. J., and C. I.

Seitz [1986], "The torus routing chip," Distributed Computing 1:4, 187-196, Dally, W. I., and B.

Towles [2001]. "Route packets, not wires: On-chip interconnection networks," Proc. 38th Design Automation Conference, June 18-22, 2001, Las Vegas. Dally, W. J., and B. Towles [2003].

Ghemawat [2008]. "MapReduce: Simplified data processing on large clusters," Communications of the ACM, 51:1, 107-113. DeCandia, G., D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A.

Principles and Practices of Interconnection Networks, Morgan Kaufmann, San Francisco. Darcy, J. D., and D. Gay [1996]. "FLECKmarks: Measuring floating point performance using a full IEEE compliant arithmetic benchmark," CS 252 class project, University of California, Berkeley (see HTTP.CS.Berkeley.EDU/~darcy/Projects/cs252/). Darley, H. M. et al. [1989]. "Floating Point/Integer Processor with Divide and Square Root Functions," U. S. Patent 4,878,190, October 31. Davidson, E. S. [1971]. "The design and control of pipelined function generators," Proc. IEEE Conf. on Systems, Networks, and Computers, January 19-21, 1971, Oaxtepec, Mexico, 19-21. Davidson, E. S., A. T. Thomas, I E. Shar, and J. H. Patel [1975].

"Effective control for pipelined processors," Proc. IEEE COMPCON, February 25-27, 1975, San Francisco, 181-184. Davie, B. S., L. L. Peterson, and D. Clark [1999]. Computer Networks: A Systems Approach, 2nd ed., Morgan Kaufmann, San Francisco, 181-184. Davie, B. S., L. L. Peterson, and D. Clark [1999]. "Designs, lessons and advice from building large distributed systems [keynote address]," Proc. 3rd ACM SIGOPS Int'l. Workshop on Large-Scale Distributed Systems and Middleware, Co-located with the 22nd ACM Symposium on Operating Systems Principles, October 11-14, 2009, Big Sky, Mont.Dean, J., and S. Ghemawat [2004]. "MapReduce: Simplified data processing on large clusters." In Proc. Operating Systems Design and Implementation (OSDI), December 6-8, 2004, San Francisco, Calif., 137-150. Dean, J., and S.

Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels [2007]. "Dynamo: Amazon's highly available key-value store," Proc. 21st ACM Symposium on Operating Systems Principles, October 14-17, 2007, Stevenson, Wash. Dehnert, J. C., P. Y.-T. Hsu, and J. P. Bratt [1989]. "Overlapped loop support on the Cydra 5," Proc. Third Int'l. Conf. on Architectural Support for Programming Languages and

Operating Systems (ASPLOS), April 3-6, 1989, Boston, Mass., 26-39. Demmel, J. W., and X. Li [1994]. "Faster numerical algorithms via exception handling," IEEE Trans. on Computers 43:8, 983-992.

"Deconstructing storage arrays," Proc. 11th Int'l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 7-13, 2004, Boston, Mass., 59-71.

Desurvire, E. [1992]. "Lightwave communications: The fifth generation," Scientific American (International Edition) 266:1 (January), 96-103. Diep, T. A., C. Nelson, and J. P. Shen [1995]. "Performance evaluation of the PowerPC 620 microarchitecture," Proc. 22nd Annual Int'l. Symposium on Computer Architecture (ISCA), June 22-24, 1995, Santa Margherita, Italy. Digital Semiconductor. [1996] Alpha Architecture Handbook, Version 3, Digital Press, Maynard, Mass.Ditzel, D. R., and H. R. McLellan [1987]. "Branch folding in the CRISP microprocessor: Reducing the branch folding R., and D. A. Patterson [1980]. "Retrospective on high-level language computer architecture," Proc. Seventh Annual Int'l.

Symposium on Computer Architecture (ISCA), May 6-8, 1980, La Baule, France, 97-104. Doherty, W. J., and R. P. Kelisky [1979]. "Managing VM/CMS systems for user effectiveness," IBM Systems J. 18:1, 143-166. Dongarra, J. J. [1986]. "A survey of high performance processors," Proc. IEEE COMPCON, March 3-6, 1986, San Francisco, 8-11.Dongarra, J., T. Sterling, H.

study of FCC service disruption reports," poster, Richard Tapia Symposium on the Celebration of Diversity in Computing, October 18-20, Houston, Tex. Erlichson, A., N. Nuckolls, G.

Simon, and E. Strohmaier [2005] "High-performance computing: Clusters, constellations, MPPs, and future directions," Computing in Science & Engineering, 7:2 (March/April), 51-59. Douceur, J.

R., and W. J. Bolosky [1999]. "A large scale study of file-system contents," Proc. ACM SIGMETRICS Conf. on Measurement and Modeling of Computer Systems, May 1-9, 1999, Atlanta, Ga., 59-69. Douglas, J.

[2005]. "Intel 8xx series and Paxville Xeon-MP microprocessors," paper presented at Hot Chips 17, August 14-16, 2005, Stanford University, Palo Alto, Calif. Duato, J. [1993]. "A new theory of deadlock-free adaptive routing in wormhole networks," IEEE Trans. on Parallel and Distributed Systems 4:12 (December) 1320-1331. Duato, J., and T. M. Pinkston [2001]. "A general theory for deadlock-free adaptive routing using a mixed set of resources," IEEE Trans. on Parallel and Distributed Systems 12:12 (December), 1219-1235. Duato, J., S. Yalamanchili, and L. Ni [2003]. Interconnection Networks: An Engineering Approach, 2nd printing, Morgan Kaufmann, San Francisco. Duato, J., I. Johnson, J. Flich, F. Naven, P. Garcia, and T. Nachiondo [2005a]. "A new scalable and cost-effective congestion management strategy for lossless multistage interconnection networks," Proc. 11th Int'l. Symposium on High-

Performance Computer Architecture, February 12-16, 2005, San Francisco. Duato, J., O. Lysne, R. Pang, and T. M. Pinkston [2005b]. "Part I: A theory for deadlockfree dynamic reconfiguration of interconnection networks," IEEE Trans. on Parallel and Distributed Systems 16:5 (May), 412-427. Dubois, M., C. Scheurich, and F. Briggs [1988]. "Synchronization, coherence, and event ordering," IEEE Computer 21:2 (February), 9-21. Dunigan, W., K. Vetter, K. White, and P. Worley [2005]. "Performance evaluation of the Cray X1 distributed shared memory architecture," IEEE Micro January/February, 30-40. Eden, A., and T. Mudge [1998]. "The YAGS branch prediction scheme,"

Proc. of the 31st Annual ACM/IEEE Int'l. Symposium on Microarchitecture, November 30-December 2, 1998, Dallas, Tex., 69-80. Edmondson, J. H., P. I. Rubinfield, R. Preston, and V. Rajagopalan [1995]. "Superscalar instruction execution in the 21164 Alpha microprocessor," IEEE Micro 15:2, 33-43. Eggers, S. [1989]. "Simulation Analysis of Data Sharing in Shared Memory Multiprocessors, Ph. D. thesis, University of California, Berkeley. Elder, J., A. Gottlieb, C. K. Kruskal, K. P. McAuliffe, L. Randolph, M. Snir, P. Teller, and J. Wilson [1985]. "Issues related to MIMD shared-memory computers: The NYU Ultracomputer approach," Proc. 12th Annual Int'l. Symposium on Computer Architecture (ISCA) , June 17-19, 1985, Boston, Mass., 126-135. Ellis, J. R. [1986] Bulldog: A Compiler for VLIW Architectures, MIT Press, Cambridge, Mass. Emer, J. S., and D. W. Clark [1984]. "A characterization of processor performance in the VAX-11/780," Proc. 11th Annual Int'l. Symposium on Computer Architecture (ISCA), June 5-7, 1984, Ann Arbor, Mich., 301-310. Enriquez, P. [2001]. "What happened to my dial tone? A

Chesson, and J. L. Hennessy [1996]. "SoftFLASH: Analyzing the performance of clustered distributed virtual shared memory," Proc. Seventh Int'l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 1-5, 1996, Cambridge, Mass., 210-220. Esmaeilzadeh, H., T. Cao, Y. Xi, S. M. Blackburn, and K. S. McKinley [2011]. "Looking Back on the Language and Hardware Revolution: Measured Power, Performance, and Scaling," Proc. 16th Int'l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), March 5-11, 2011, Newport Beach, Calif. Evers, M., S. J. Patel, R. S. Chappell, and Y. N. Patt [1998]. "An analysis of correlation and predictability: What makes two-level branch predictors work," Proc. 25th Annual Int'l. Symposium on Computer

Architecture (ISCA), July 3-14, 1998, Barcelona, Spain, 52-61. Fabry, R. S. [1974]. "Capability based addressing," Communications of the ACM 17:7 (July), 403-412. Falsafi, B., and D. A. Wood [1997]. "Reactive NUMA: A design for unifying S-COMA and CC-NUMA," Proc. 24th Annual Int'l. Symposium on Computer Architecture (ISCA), June 2-4, 1997, Denver, Colo., 229-240. Fan, X., W. Weber, and L. A. Barroso [2007]. "Power provisioning for a warehouse-sized computer," Proc. 34th Annual Int'l. Symposium on Computer Architecture (ISCA), June 9-13, 2007, San Diego, Calif. Farkas, K. I., and N. P. Jouppi [1994]. "Complexity/performance trade-offs with nonblocking loads," Proc. 21st Annual Int'l. Symposium on Computer Architecture (ISCA), April 18-21, 1994, Chicago. Farkas, K.

I., N. P. Jouppi, and P. Chow [1995]. "How useful are non-blocking loads, stream buffers and speculative execution in multiple issue processors?," Proc. First IEEE Symposium on High-Performance Computer Architecture, January 22-25, 1995, Raleigh, N.C., 78-89. Farkas, K. I., P.

Jouppi, and Z. Vranesic [1997]. "Memory-system design considerations for dynamically-scheduled processors," Proc. 24th Annual Int'l. Symposium on Computer than it is simply inventing one," Proc. IEEE COMPCON, February 23-27, 1987, San Francisco, 102-105. Fisher, J. A. [1981]. "Trace scheduling: A technique for global microcode compaction," IEEE Trans. on Computers 30:7 (July), 478-490. Fisher, J. A. [1983]. "Very long instruction word architectures and ELI-512," 10th Annual Int'l. Symposium on Computer Architecture (ISCA), June 5-7, 1982, Stockholm, Sweden, 140-150. Fisher, J. A., and S. M. Freudenberger [1992]. "Predicting conditional branches from previous runs of a program," Proc. Fifth Int'l. Conf. on Architectural Support for Programming

Languages and Operating Systems (ASPLOS), October 12-15, 1992, Boston, Mass., 85-95. Fisher, J. A., J. R. Ellis, J. C. Ruttenberg, and A. Nicolau [1984]. "Parallel processing: A smart compiler and a dumb processor," Proc. SIGPLAN Conf. on Compiler Construction , June 17-22, 1984, Montreal, Canada, 11-16. Flemming, P. J., and J. . Wallace [1986]. "How not to lie with statistics: The correct way to summarize benchmarks results," Communications of the ACM 29:3 (March), 218-221. Flynn, M. J. [1966]. "Very high-speed computing systems," Proc. IEEE 54:12 (December), 1901-1909. Forgie, J. W. [1957]. "The Lincoln TX-2 input-output system," Proc. Western Joint Computer Conference (February), Institute of Radio Engineers, Los Angeles, 156-160. Foster, C. C., and E. M. Riseman [1972]. "Percolation of code to enhance parallel dispatching and execution," IEEE Trans. on Computers C-21:12 (December), 1411-1415. Frank, S. J. [1984]. "Tightly coupled multiprocessor systems speed memory access time," Electronics 57:1 (January), 164-169. Freiman, C. V.

[1961], "Statistical analysis of certain binary division algorithms," Proc. IRE 49:1, 91-103. Friesenborg, S. E., and R. J. Wicks [1985], DASD Expectations: The 3380, 3380-23, and MVS/XA, Tech. Bulletin GG22-9363-02, IBM Washington Systems Center, Gaithersburg, Md.Fuller, S. H., and W. E. Burr [1977]. "Measurement and evaluation of alternative computer architectures," Computer 10:10 (October), 24-35. Furber, S. B. [1996]. ARM System Architecture , Addison-Wesley, Harlow, England (see www.cs.man.ac.uk/amulet/publications/books/ARMsysArch). Gagliardi, U. O. [1973]. "Report of workshop 4--software-related advances in computer hardware, Proc. Symposium on the High Cost of Software, September 17-19, 1973, Monterey, Calif., 99-120. Gajski, D., D. Kuck, D. Lawrie, and A. Sameh [1983]. "CEDAR--a large scale multiprocessor," Proc. Int'l. Conf. on Parallel Processing (ICPP), August, Columbus, Ohio, 524-529. Gallagher, D. M., W. Y. Chen, S. A. Mahlke, J. C. Gyllenhaal, and W. W. Hwu [1994]. "Dynamic memory disambiguation using the memory conflict buffer," Proc. Sixth Int'l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 4-7, Santa Jose, Calif., 183-193. Galles, M. [1996]. "Scalable pipelined interconnect for distributed endpoint routing: The SGI SPIDER chip, Proc. IEEE HOT Interconnects '96, August 15-17, 1996, Stanford University, Palo Alto, Calif.Game, M., and A. Booker [1999]. "CodePack code compression for PowerPC processors," MicroNews , 5:1, www.chips.ibm.com/micronews/vol5 no1/codepack.html.Gao, Q. S. [1993]. "The Chinese remainder theorem and the prime memory system," 20th Annual Int'l. Symposium on Computer Architecture (ISCA), May 16-19, 1993, San Diego, Calif.

"Gap Inc. Reports Third Quarter Earnings," . [2006]. "Gap Inc. Reports Fourth Quarter and Full Year Earnings," R., A. Agarwal, F. Briggs, E. Brown, D. Hough, B.

Joy, S. Kleiman, S. Muchnick, M. Namjoo, D. Patterson, J. Pendleton, and R. Tuck [1988]. "Scalable processor architecture (SPARC)," Proc. IEEE Computer 40:4 (April), 68-75. Gee, J. D., M. D. Hill, D. N. Pnevmatikatos, and A. J. Smith [1993].

"Cache performance of the SPEC92 benchmark suite," IEEE Micro 13:4 (August), 17-27. Gehringer, E. F., D. P. Siewiorek, and Z. Segall [1987]. Parallel Processing: The Cm\* Experience, Digital Press, Bedford, Mass. Gharachorloo, K., A.

Gupta, and J. L. Hennessy [1992]. "Hiding memory latency using dynamic scheduling in shared-memory multiprocessors," Proc. 19th Annual Int'l. Symposium on Computer Architecture (ISCA), May 19-21, 1992, Gold Coast, Australia. Gharachorloo, K., D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. L. Hennessy [1990]. "Memory consistency and event ordering in scalable shared-memory multiprocessors," Proc. 17th Annual Int'l. Symposium on Computer Architecture (ISCA), May 28-31,

1990, Seattle, Wash., 15-26. Ghemawat, S., H. Gobioff, and S.-T. Leung [2003]. "The Google file systems," Proc. 19th ACM Symposium on Operating Systems Principles, October 19-22, 2003, Bolton Landing, N.Y. Gibson, D. H. [1967]. "Considerations in block-oriented systems design," AFIPS Conf. Proc. 30, 75-80. Gibson, G. A. [1992]. Redundant Disk Arrays: Reliable, Parallel Secondary Storage, ACM Distinguished Dissertation Series, MIT Press, Cambridge, Mass. Gibson, I. C.

[1970].

"The Gibson mix," Rep. TR. 00.2043, IBM Systems Development Division, Poughkeepsie, N.Y. (research done in 1959). Gibson, J., R. Kunz, D. Ofelt, M. Horowitz, J. Hennessy, and M. Heinrich [2000]. "FLASH vs. (simulated) FLASH: Closing the simulation loop," Proc. Ninth Int'l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), November 12-15, Cambridge, Mass., 49-58 Glass, C.

J., and L. M. Ni [1992]. "The Turn Model for adaptive routing," 19th Annual Int'l. Symposium on Computer Architecture (ISCA), May 19-21, 1992, Gold Coast, Australia. Goldberg, D. [1991]. "What every computer scientist should know about floating-point arithmetic," Computing Surveys 23:1, 5-48. Goldberg, I. B. [1967]. "27 bits are not enough for 8-digit accuracy," Communications of the ACM 10:2, 105-106. Goldstein, S. [1987]. Storage Performance--An Eight Year Outlook, Tech.

Rep. TR 03.308-1, Santa Teresa Laboratory, IBM Santa Teresa Laboratory, San Jose, Calif. Goldstine, H. H. [1972]. The Computer: From Pascal to von Neumann, Princeton, N. J. Gonzalez, J., and A. González [1998]. "Limits of instruction level parallelism with data speculation," Proc. Vector and Parallel Processing (VECPAR) Conf., June 21-23, 1998, Porto, Portugal, 585-598. Goodman, J. R. [1983]. "Using cache memory to reduce processor memory traffic," Proc. 10th Annual Int'l. Symposium on Computer Architecture (ISCA), June 5-7, 1982, Stockholm, Sweden, 124-131. Goralski, W.

[1997]. SONET: A Guide to Synchronous Optical Network, McGraw-Hill, New York. Gosling, J. B. [1980]. "A census of Tandem system availability between 1985 and 1990," IEEE Trans. on Reliability, 39:4 (October), 409-418. Gray, J. (ed.) [1993]. The Benchmark Handbook for Database and Transaction Processing Systems, 2nd ed., Morgan Kaufmann, San Francisco. Gray, J. [2006]. Sort benchmark home page, J., and A. Reuter [1993]. Transaction Processing: Concepts and Techniques, Morgan Kaufmann, San Francisco. Gray, J., and D. P. Siewiorek [1991]. "High-availability computer systems," Computer 24:9 (September), 39-48. Gray, J., and C.

van Ingen [2005]. Empirical Measurements of Disk Failure Rates and Error Rates, MSR-TR-2005-166, Microsoft Research, Redmond, Wash. Greenberg, A., N. Jain, S. Kandula, C. Kim, P. Lahiri, D. Maltz, P. Patel, and S. Sengupta [2009]. "VL2: A Scalable and Flexible Data Center Network," in Proc. ACM SIGCOMM, August 17-21, 2009, Barcelona, Spain. Grice, C., and M. Kanellos [2000]. "Cell phone industry at crossroads: Go high or low?," CNET News , August 31, technews.netscape.com/news/0-1004-201-2518386- 0.html?tag=st.ne.1002.tgif.sf.Groe, J. B., and L. E. Larson [2000]. CDMA Mobile Radio Design , Artech House, Boston. Gunther, K. D. [1981]. "Prevention of deadlocks in packetswitched data transport systems," IEEE Trans. on Communications COM-29:4 (April), 512-524. Hagersten, E., and M. Koster [1998]. "WildFire: A scalable path for SMPs," Proc. Fifth Int'l. Symposium on High-Performance Computer Architecture, January 9-12, 1999, Orlando, Fla. Hagersten, E., A. Landin, and S. Haridi [1992]. "DDM--a cache-only memory architecture," IEEE Computer 25:9 (September), 44-54.

Hamacher, V. C., Z.

(Computer Architecture News 21:2 (May), 337-340). Gap. [2005].

G. Vranesic, and S. G. Zaky [1984]. Computer Organization, 2nd ed., McGraw-Hill, New York. Hamilton, J. [2009]. "Data center networks are in my way," paper presented at the Stanford Clean Slate CTO Summit, October 23, 2009 (.Hamilton, J. [2010], "Cloud computing economies of scale," paper presented at the AWS Workshop on Genomics and Cloud Computing, June 8, 2010, Seattle, Wash. (.Handy, J. [1993]. The Cache Memory Book, Academic Press, Boston, Hauck, E. A., and B. A. Dent [1968]. "Burroughs' B6500/B7500 stack mechanism," Proc. AFIPS Spring Joint Computer Conf. April 30-May 2, 1968, Atlantic City, N.

I., 245-251. Heald, R., K. Aingaran, C. Amir, M. Ang, M. Boland, A. Das, P.

Dixit, G. Gouldsberry, J. Hart, T. Horel, W.-J. Hsu, J. Kaku, C. Kim, S. Kim, F. Klass, H. Kwan, R. Lo, H. McIntyre, A. Mehta, D. Murata, S. Nguyen, Y.-P. Pai, S. Patel, K. Shin, K. Tam, S. Vishwanthaiah, J. Wu, G. Yee, and H. You [2000]

"Implementation of thirdgeneration SPARC V9 64-b microprocessor," ISSCC Digest of Technical Papers, 412-413 and slide supplement. Heinrich, J. [1993]. MIPS R4000 User's Manual, Prentice Hall, Englewood Cliffs, N. J. Henly, M., and B. McNutt [1989]. DASD I/O Characteristics: A Comparison of MVS to VM, "Tech. Rep. TR 02.1550 (May), IBM General Products Division, San Jose, Calif. Hennessy, J. [1984]. "VLSI processor architecture," IEEE Trans. on Computers C-33:11 (December), 1221-1246.

Hennessy, J. [1985]. "VLSI RISC processors," VLSI Systems Design 6:10 (October), 22-32. Hennessy, J., N. Jouppi, F. Baskett, and J. Gill [1981]. "MIPS: A VLSI processor architecture," in CMU Conference on VLSI Systems and Computations, Computer Science Press, Rockville, Md.Hewlett-Packard. [1994]. PA-RISC 2.0 Architecture Reference Manual , 3rd ed., Hewlett-Packard, Palo Alto, Calif. Hewlett-Packard. [1998]. "HP's '5NINES:5MINUTES' Vision Extends Leadership and Redefines High Availability in Mission-Critical Environments," February 10, www.future.enterprisecomputing.hp.com/ia64/news/5nines vision pr.html. Hill, M. D. [1987]. "Aspects of Cache Memory and Instruction Buffer Performance." Ph. D. thesis. Tech. Rep. UCB/CSD 87/381, Computer Science Division, University of California, Berkelev. Hill, M. D. [1988]. "A case for direct mapped caches," Computer 21:12 (December), 25-40. Hill, M. D. [1988]. "A case for direct mapped caches," IEEE Computer 31:8 (August), 28-34. Hillis, W. D. [1985]. The Connection Multiprocessor, MIT Press, Cambridge, Mass. Hillis, W. D. and G. L. Steele [1986]. "Data parallel algorithms," Communications of the ACM 29:12 (December), 1170-1183. ( Hinton, G., D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel [2001].

"The microarchitecture of the Pentium 4 processor," Intel Technology Journal, February, Hintz, R. G., and D. P. Tate [1972], "Control data STAR-100 processor design," Proc. IEEE COMPCON, September 12-14, 1972, San Francisco, 1-4, Hirata, H., K. Kimura, S. Nagamine, Y. Mochizuki, A. Nakase, and T. Nishizawa [1992]. "An elementary processor architecture with simultaneous instruction issuing from multiple threads," Proc. 19th Annual Int'l. Symposium on Computer Architecture (ISCA), May 19-21, 1992, Gold Coast, Australia, 136-145. Hitachi

[1997]. SuperH RISC Engine SH7700 Series Programming Manual, Hitachi, Santa Clara, Calif. (see www.halsp.hitachi.com/tech prod/and search for title). Ho, R., K. W. Mai, and M. A. Horowitz [2001]. "The future of wires," Proc.

of the IEEE 89:4 (April), 490-504. Hoagland, A. S. [1963]. Digital Magnetic Recording, Wiley, New York. Hockney, R. W., and C. R. Jesshope [1988]. Parallel Computers 2: Architectures, Programming and Algorithms, Adam Hilger, Ltd., Bristol, England. Holland, J. H. [1959]. "A universal computer capable of executing an arbitrary number of subprograms simultaneously," Proc. East Joint Computers

Conf. 16, 108-113. Holt, R. C. [1972]. "Some deadlock properties of computer systems," ACM Computer Surveys 4:3 (September), 179-196. Hopkins, M. [2000]. "A critical look at IA-64: Massive resources, massive ILP, but can it deliver?" Microprocessor Report, February. Hord, R. M. [1982]. The Illiac-IV, The First Supercomputer, Computer Science Press, Rockville, Md.Horel, T., and G. Lauterbach [1999]. "UltraSPARC-III: Designing third-generation 64-bit performance," IEEE Micro 19:3 (May-June), 73-85. Hospodor, A. D., and A. S. Hoagland [1993]. "The changing nature of disk controllers." Proc. IEEE 81:4 (April), 586-594. Holzle, U. [2010]. "Brawny cores, most of the time," IEEE Micro 30:4 (July/August). Hristea, C., D. Lenoski, and J. Keen [1997]. "Measuring memory hierarchy performance of cache-coherent

multiprocessors using micro benchmarks," Proc. ACM/IEEE Conf. on Supercomputing, November 16-21, 1997, San Jose, Calif. Hsu, P. [1994]. "Designing the TFP microprocessor," IEEE Micro 18:2 (April), 2333. Huck, J. et al. [2000]. "Introducing the IA-64 Architecture" IEEE Micro, 20:5 (September-October), 12-23. Hughes, C. J., P. Kaul, S. V. Adve, R. Jain, C. Park, and J. Srinivasan [2001]. "Variability in the execution of multimedia applications and implications for architecture," Proc. 28th Annual Int'l. Symposium on Computer Architecture (ISCA), June 30-July 4, 2001, Goteborg, Sweden, 254-265. Hwang, K. [1993]. Advanced Computer Architecture and Parallel Programming, McGraw-Hill, New York. Hwang, K. [1993]. Advanced Computer Architecture and Parallel Programming, McGraw-Hill, New York. Hwang, K. [1993]. Advanced Computer Architecture and Parallel Programming, McGraw-Hill, New York. Hwang, K. [1993]. [1986]. "HPSm, a high performance restricted data flow architecture having minimum functionality," Proc. 13th Annual Int'l. Symposium on Computer Architecture (ISCA), June 2-5, 1986, Tokyo, 297-307. Hwu, W. W., S. A. Mahlke, W. Y. Chen, P. P. Chang, N. J. Warter, R. A. Bringmann, R. O. Ouellette, R. E. Hank, T. Kiyohara, G. E. Haab, J. G. Holm, and D. M. Lavery [1993]. "The superblock: An effective technique for VLIW and superscalar compilation," J. Supercomputing 7:1, 2 (March), 229-248. IBM. [1990]. "The IBM RISC System/6000 processor" (collection of papers),

IBM J. Research and Development 34:1 (January).IBM. [1994]. The PowerPC Architecture, Morgan Kaufmann, San Francisco.IBM. [2005]. "IEEE standard for binary floating-point arithmetic," SIGPLAN Notices 22:2, 9-25.IEEE. [2005]. "Intel virtualization technology, computer," IEEE Computer Society 38:5 (May), 48-56. IEEE. 754-2008 Working Group. [2006]. "DRAFT Standard for Floating-Point Arithmetic 754-2008," . Imprimis Product Specification , 97209 Sabre Disk Drive IPI-2 Interface 1.2 GB , Document No. 64402302, Imprimis, Dallas, Tex.InfiniBand Trade Association. [2001]. "Using MMX Instructions to Convert RGB to YUV Color Conversion," cedar.intel.com/cgi-bin/ids.dll/content,jsp?cntKey=Legacy::irtm AP548 9996& cntType=IDS EDITORIAL.Internet Retailer, [2005]. "The Gap launches a new site--after two weeks of downtime," Internet ® Retailer, September 28, R. [1991]. The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling, Wiley, New York. Jantsch, A., and H. Tenhunen (eds.) [2003]. Networks on Chips, Kluwer Academic Publishers, The Netherlands. Jimenez, D. A., and C. Lin [2002].

"Neural methods for dynamic branch prediction," ACM Trans. on Computer Systems 20:4 (November), 369-397. Johnson, M. [1990]. Superscalar Microprocessor Design, Prentice Hall, Englewood Cliffs, N. J.Jordan, H. F. [1983]. "Performance measurements on HEP--a pipelined MIMD computer," Proc. 10th Annual Int'l. Symposium on Computer Architecture (ISCA), June 5-7, 1982, Stockholm, Sweden, 207-212. Jordan, K. E. [1987]. "Performance comparison of large-scale scientific processors: Scalar mainframes, mainframes with vector facilities, and supercomputers," Computer 20:3 (March), 10-23. Jouppi, N. P. [1990]. "Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers," Proc. 17th Annual Int'l. Symposium on Computer Architecture (ISCA), May 28-31, 1990, Seattle, Wash., 364-373. Jouppi, N. P. [1998]. "Retrospective: Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers," Proc. 17th Annual Int'l. Symposium on Computer Architecture (ISCA), May 28-31, 1990, Seattle, Wash., 364-373. Jouppi, N. P. [1998]. "Retrospective: Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers," Proc. 17th Annual Int'l. Symposium on Computer Architecture (ISCA), May 28-31, 1990, Seattle, Wash., 364-373. Jouppi, N. P. [1998]. "Retrospective: Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers," Proc. 17th Annual Int'l. Symposium on Computer Architecture (ISCA), May 28-31, 1990, Seattle, Wash., 364-373. Jouppi, N. P. [1998]. "Retrospective: Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers," Proc. 17th Annual Int'l. Symposium on Computer Architecture (ISCA), May 28-31, 1990, Seattle, Wash., 364-373. Jouppi, N. P. [1998]. "Retrospective: Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers," Proc. 17th Annual Int'l. Symposium on Computer Architecture (ISCA), May 28-31, 1990, Seattle, Wash., 364-373. Jouppi, N. P. [1998].

direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers," 25 Years of the International Symposia on Computer Architecture (Selected Papers), ACM, New York, 71-73. Jouppi, N. P., and D. W. Wall [1989]. "Available instruction-level parallelism for superscalar and superpipelined processors," Proc. Third Int'l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), April 3-6, 1989, Boston, 272-282. Jouppi, N. P., and S. J. E. Wilton [1994]. "Trade-offs in two-level on-chip caching," Proc. 21st Annual Int'l. Symposium on Computer Architecture (ISCA), April 18-21, 1994, Chicago, 34-45. Kaeli, D.

R., and P. G. Emma [1991]. "Branch history table prediction of moving target branches due to subroutine returns," Proc. 18th Annual Int'l. Symposium on Computer Architecture (ISCA), May 27-30, 1991, Toronto, Canada, 34-42. Kahan, J. [1990]. "On the advantage of the 8087's stack," unpublished course notes, Computer Science Division, University of California, Berkeley, Kahan, W. [1968]. "7094-II system support for numerical analysis," SHARE Secretarial Distribution SSD-159, Department of Computer Science, University of Toronto, Kahaner, D. K. [1988]. "Benchmarks for 'real' programs," SIAM News, November, Kahan, R. E. [1972]. "Resource-sharing computer communication networks," Proc. IEEE 60:11 (November), 1397-1407.Kane, G. [1986]. MIPS R2000 RISC Architecture, Prentice Hall, Englewood Cliffs, N. J.Kane, G.

[1996]. PA-RISC 2.0 Architecture, Prentice Hall, Upper Saddle River, N. J. Kane, G., and J. Heinrich [1992]. MIPS RISC Architecture for high performance computing," Proc. IEEE 77:12 (December), 1842-1858. Keckler, S. W., and W. J. Dally [1992]. "Processor coupling: Integrating compile time and runtime scheduling for parallelism," Proc. 19th Annual Int'l. Symposium on Computer Architecture (ISCA), May 19-21, 1992, Gold Coast, Australia, 202-213. Keller, R. M. [1975]. "Look-ahead processors," ACM Computing Surveys 7:4 (December), 177-195. Keltcher, C. N., K. J. McGrath, A. Ahmed, and P. Conway [2003]. "The AMD Opteron processor for multiprocessor servers," IEEE Micro 23:2 (March-April), 66-76 (dx.doi.org/10.1109. MM.2003.119116).

Kembel, R. [2000]. "Fibre Channel: A comprehensive introduction," Internet Week, April. Kermani, P., and L. Kleinrock [1979]. "Virtual Cut-Through: A New Computer Communication Switching Technique," Computer Networks 3 (January), 267-286. Kessler, R. [1999]. "The Alpha 21264 microprocessor," IEEE Micro 19:2 (March/April) 24-36. Kilburn,

T., D. B. G. Edwards, M. J. Lanigan, and F. H. Sumner [1962]. "One-level storage system," IRE Trans. on Electronic Computers EC-11 (April) 223-235. Alsoappears in D. P. Siewiorek, C. G. Bell, and A. Newell, Computer Structures: Principles and Examples, McGraw-Hill, New York, 1982, 135-148.

Killian, E. [1991]. "MIPS R4000 technical overview-64 bits/100 MHz or bust," Hot Chips III Symposium Record, August 26-27, 1991, Stanford University, Palo Alto, Calif., 1.6-1.19.Kim, M. Y. [1986]. "Synchronized disk interleaving," IEEE Trans. on Computers C-35:11 (November), 978-988. Kissell, K. D. [1997]. "MIPS16: High-density for the embedded market," Proc. Real Time Systems '97, June 15, 1997, Las Vegas, Nev. (see www.sqi.com/MIPS/arch/MIPS16/MIPS16.whitepaper.pdf).Kitaqawa, K., S.

Tagaya, Y. Hagihara, and Y. Kanoh [2003]. "A hardware overview of SX-6 and SX-7 supercomputer," NEC Research & Development J. 44:1 (January), 2-7. Knuth, D.

[1981]. The Art of Computer Programming, Vol. II, 2nd ed., Addison-Wesley, Reading, Mass.Kogge, P. M.

[1981]. The Architecture of Pipelined Computers, McGraw-Hill, New York. Kohn, L., and S.-W. Fu [1989]. "A 1,000,000 transistor microprocessor," Proc. of IEEE Int'l. Symposium on Solid State Circuits (ISSCC), February 15-17, 1989, New York, 54-55. Kohn, L., and N. Margulis [1989]. "Introducing the Intel i860 64-Bit Microprocessor," Proc. of IEEE Int'l. Symposium on Solid State Circuits (ISSCC), February 15-17, 1989, New York, 54-55. Kohn, L., and N. Margulis [1989]. "Introducing the Intel i860 64-Bit Microprocessor," Proc. of IEEE Int'l. Symposium on Solid State Circuits (ISSCC), February 15-17, 1989, New York, 54-55. Kohn, L., and N. Margulis [1989]. "Introducing the Intel i860 64-Bit Microprocessor," IEEE Microprocessor," Proc. of IEEE Int'l. Symposium on Solid State Circuits (ISSCC), February 15-17, 1989, New York, 54-55. Kohn, L., and N. Margulis [1989]. "Introducing the Intel i860 64-Bit Microprocessor," Proc. of IEEE Int'l. Symposium on Solid State Circuits (ISSCC), February 15-17, 1989, New York, 54-55. Kohn, L., and N. Margulis [1989]. "Introducing the Intel i860 64-Bit Microprocessor," Proc. of IEEE Int'l. Symposium on Solid State Circuits (ISSCC), February 15-17, 1989, New York, 54-55. Kohn, L., and N. Margulis [1989]. "Introducing the Intel i860 64-Bit Microprocessor," Proc. of IEEE Int'l. Symposium on Solid State Circuits (ISSCC), February 15-17, 1989, New York, 54-55. Kohn, L., and N. Margulis [1980]. "Introducing the Intel i860 64-Bit Microprocessor," Proc. of IEEE Int'l. Symposium on Solid State Circuits (ISSCC), February 15-17, 1989, New York, 1989, New Yor 9:4 (July), 15-30. Kontothanassis, L., G. Hunt, R. Stets, N. Hardavellas, M. Cierniak, S. Parthasarathy, W. Meira, S. Dwarkadas, and M. Scott [1997]. "VM-based shared memory on lowlatency, remote-memory-access networks," Proc. 24th Annual Int'l. Symposium on Computer Arithmetic Algorithms, Prentice Hall, Englewood Cliffs, N. J. Kozyrakis, C. [2000]. "Vector IRAM: A media-oriented vector processor with embedded DRAM," paper presented at Hot Chips 12, August 13-15, 2000, Palo Alto, Calif, 13-15. Kozyrakis, C., and D. Patterson, [2002]. "Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks," Proc. 35th Annual Int'l. Symposium on Microarchitecture (MICRO-35), November 18-

22, 2002, Istanbul, Turkey. Kroft, D. [1981]. "Lockup-free instruction fetch/prefetch cache organization," Proc. Eighth Annual Int'l. Symposium on Computer Architecture (ISCA), May 12-14, 1981, Minneapolis, Minn., 81-87. Kroft, D. [1998]. "Retrospective: Lockup-free instruction fetch/prefetch cache organization," 25 Years of the International Symposia on Computer Architecture (Selected Papers), ACM, New York, 20-21. Kuck, D., P. P. Budnik, S.-C. Chen, D. H. Lawrie, R. A. Towle, R. E. Strebendt, E. W. Davis, Jr., J. Han, P. W. Kraska, and Y. Muraoka [1974]. "Measurements of parallelism in ordinary FORTRAN programs," Computer 7:1 (January), 37-46. Kuhn, D. R. [1997]. "Sources of failure in the public switched telephone network, IEEE Computer 30:4 (April), 31-36. Kumar, A. [1997]. "The HP PA-8000 RISC CPU," IEEE Micro 17:2 (March/April), 27-32. Kunimatsu, A., N. Ide, T. Sato, Y. Endo, H. Murakami, T. Kamei, M. Hirano, F. Ishihara, H. Tago, M. Oka, A. Ohba, T. Yutaka, T. Okada, and M. Suzuoki [2000]. "Vector unit architecture for emotion synthesis," IEEE Micro 20:2 (March-April), 40-47. Kunkel, S. R., and J. E. Smith [1986]. "Optimal pipelining in supercomputers," Proc. 13th Annual Int'l. Symposium on Computer Architecture (ISCA), June 2-5, 1986, Tokyo, 404-414. Kurose, J. F., and K. W.

Computer Networking: A Top-Down Approach Featuring the Internet, Addison-Wesley, Boston. Kuskin, J., D. Ofelt, M. Heinrich, J. Heinlein, R. Simoni, K. Gharachorloo, J. Chapin, D. Nakahira, J. Baxter, M. Horowitz, A. Gupta, M. Rosenblum, and J. L. Hennessy [1994]. "The Stanford FLASH multiprocessor," Proc. 21st Annual Int'l. Symposium on Computer Architecture (ISCA), April 18-21, 1994, Chicago. Lam, M. [1988]. "Software pipelining: An effective scheduling technique for VLIW processors," SIGPLAN Conf. on Programming Language Design and Implementation, June 22-24, 1988, Atlanta, Ga., 318-328. Lam, M. S., and R.

Wilson [1992]. "Limits of control flow on parallelism," Proc. 19th Annual Int'l. Symposium on Computer Architecture (ISCA), May 19-21, 1992, Gold Coast, Australia, 46-57 Lam, M. S., E. E. Rothberg, and M. E. Wolf [1991]. "The cache performance and optimizations of blocked algorithms," Proc. Fourth Int'l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), April 8-11, 1991, Santa Clara, Calif. (SIGPLAN Notices 26:4 (April), 63-74). Lambright, D. [2000] "Experiences in measuring the reliability of a cache-based storage system," Proc. of First Workshop on Industrial Experiences with Systems Design and Implementation (OSDI), October 22, 2000, San Diego, Calif. Lamport, L. [1979]. "How to make a multiprocessor computer that correctly executes multiprocess programs," IEEE Trans. on Computers C-28:9 (September), 241-248. Lang, W., J. M.

Patel, and S. Shankar [2010]. "Wimpy node clusters: What about non-wimpy workloads?" Proc. Sixth International Workshop on Data Management on New Hardware (DaMoN), June 7, Indianapolis, Ind. Laprie, J.-C. [1985]. "Dependable computing and fault tolerance: Concepts and terminology," Proc. 15th Annual Int'l. Symposium on Fault-Tolerant Computing, June 19-21, 1985, Ann Arbor, Mich., 2-11.Larson, E. R. [1973]. "Findings of fact, conclusions of law, and order for judgment," File No. 4-67, Civ. 138, Honeywell v. Sperry-Rand and Illinois Scientific Development, U. S. District Court for the State of Minnesota, Fourth Division (October 19). Laudon, J., and D.

Lenoski [1997]. "The SGI Origin: A ccNUMA highly scalable server," Proc. 24th Annual Int'l. Symposium on Computer Architecture (ISCA), June 2-4, 1997, Denver, Colo., 241-251. Laudon, J., A. Gupta, and M. Horowitz [1994]. "Interleaving: A multithreading technique targeting multiprocessors and workstations," Proc. Sixth Int'l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 4-7, San Jose, Calif., 308-318. Lauterbach, G., and T. Horel [1999]. "UltraSPARC-III: Designing third generation 64-bit performance," IEEE Micro 19:3 (May/June).

Lazowska, E. D., J. Zahorjan, G. S. Graham, and K. C. Sevcik [1984]. Quantitative System Performance: Computer System Analysis Using Queueing Network Models, Prentice Hall, Englewood Cliffs, N. J. (Although out of print, it is available online at www.cs.washington.edu/homes/lazowska/qsp/.) Lebeck, A. R., and D. A. Wood [1994]. "Cache profiling and the SPEC benchmarks: A case study, "Computer 27:10 (October), 15-26. Lee, R. [1989]. "Precision architecture," Computer 22:1 (January), 78-91. Lee, W. V. et al. [2010]. "Debunking the 100X GPU vs. CPU myth: An evaluation of throughput computing on CPU and GPU," Proc. 37th Annual Int'l. Symposium on Computer Architecture (ISCA), June 19-23, 2010, Saint-Malo, France. Leighton, F. T. [1992]. Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes, Morgan Kaufmann, San Francisco. Leiner, A. L. [1954]. "System specifications for the DYSEAC," J. ACM 1:2 (April), 57-81. Leiner, A. L., and S. N. Alexander [1954]. "System organization of the DYSEAC," IRE Trans. of Electronic Computers EC-3:1 (March), 1-10.Leiserson, C. E. [1985]. "Fat trees: Universal networks for hardware-efficient supercomputing," IEEE Trans. on Computers C-34:10 (October), 892-901. Lenoski, D., J. Laudon, K. Gharachorloo, A. Gupta, and J. L. Hennessy [1990]. "The Stanford

DASH multiprocessor," Proc. 17th Annual Int'l. Symposium on Computer Architecture (ISCA), May 28-31, 1990, Seattle, Wash., 148-159.Lenoski, D., J. Laudon, K. Gharachorloo, W.-D. Weber, A. Gupta, J. L. Hennessy, M. A. Horowitz, and M. Lam [1992]. "The Stanford DASH multiprocessor," IEEE Computer 25:3 (March), 63-79. Levy, H., and R. Eckhouse [1989]. Computer Programming and

Architecture: The VAX, Digital Press, Boston. Li, K. [1988]. "IVY: A shared virtual memory system for parallel computing," Proc. 1988 Int'l. Conf. on Parallel Processing, Pennsylvania State University Press, University Press, University Park, Penn.Li, S., K. Chen, J. B. Brockman, and N. Jouppi [2011]. "Performance Impacts of Nonblocking Caches in Out-of-order Processors," HP Labs Tech Report HPL-2011-65 (full text available at .Lim, K., P. Ranganathan, J. Chang, C. Patel, T. Mudge, and S. Reinhardt [2008]. "Understanding and designing new system architectures for emerging warehouse-computing environments," Proc. 35th Annual Int'l. Symposium on Computer Architectures for emerging warehouse-computing environments," Proc. 35th Annual Int'l. Symposium on Computer Architectures for emerging warehouse-computing environments," Proc. 35th Annual Int'l. Symposium on Computer Architectures for emerging warehouse-computing environments," Proc. 35th Annual Int'l. Symposium on Computer Architectures for emerging warehouse-computing environments," Proc. 35th Annual Int'l. Symposium on Computer Architectures for emerging warehouse-computing environments," Proc. 35th Annual Int'l. Symposium on Computer Architectures for emerging warehouse-computing environments," Proc. 35th Annual Int'l. Symposium on Computer Architectures for emerging warehouse-computing environments," Proc. 35th Annual Int'l. Symposium on Computer Architectures for emerging warehouse-computing environments, and the process of the computer Architectures for emerging warehouse-computing environments, and the process of the computer Architectures for emerging warehouse-computing environments. offs in the creation of a modern supercomputer," IEEE Trans. on Computers C-31:5 (May), 363-376. Lindholm, T., and F. Yellin [1999]. The Java Virtual Machine Specification, 2nd ed., Addison-Wesley, Reading, Mass. (also available online at java.sun.com/docs/books/vmspec/). Lipasti, M. H., and J. P. Shen [1996]. "Exceeding the dataflow limit via value prediction," Proc. 29th Int'l. Symposium on Microarchitecture, December 2-4, 1996, Paris, France.

Lipasti, M. H., C. B. Wilkerson, and J. P. Shen [1996]. "Value locality and load value prediction," Proc. Seventh Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 1-5, 1996, Cambridge, Mass., 138-147.

Liptay, J. S. [1968]. "Structural aspects of the System/360 Model 85, Part II: The cache," IBM Systems J. 7:1, 15-21. Lo, J., L. Barroso, S. Eggers, K. Gharachorloo, H. Levy, and S. Parekh [1998]. "An analysis of database workload performance on simultaneous multithreaded processors," Proc. 25th Annual Int'l. Symposium on Computer Architecture (ISCA), July 3-14, 1998, Barcelona, Spain, 39-50. Lo, J., S. Eggers, J. Emer, H. Levy, R.

Stamm, and D. Tullsen [1997]. "Converting threadlevel parallelism into instruction-level parallelism via simultaneous multithreading," ACM Trans. on Computer Systems 15:2 (August), 322-354. Lovett, T., and S. Thakkar [1988].

"The Symmetry multiprocessor system," Proc. 1988 Int'l. Conf. of Parallel Processing, University Park, Penn., 303-310.Lubeck, O., J. Moore, and R. Mendez [1985]. "A benchmark comparison of three supercomputers: Fujitsu VP-200, Hitachi S810/20, and Cray X-MP/2," Computer 18:12 (December), 10-24. Luk, C.-K., and T. C Mowry [1999]. "Automatic compiler-inserted prefetching for pointer-based applications," IEEE Trans. on Computers 48:2 (February), 134-141. Lunde, A. [1977]. "Empirical evaluation of some features of instruction set processor architecture," Communications of the ACM 20:3 (March), 143-152. Luszczek, P., J. J. Dongarra, D. Koester, R. Rabenseifner, B. Lucas, J. Kepner, J. McCalpin, D. Bailey, and D. Takahashi [2005]. "Introduction to the HPC challenge benchmark suite," Lawrence Berkeley National Laboratory, Paper LBNL-57493 (April 25), repositories.cdlib.org/lbnl/LBNL-57493 (April 25), repositories.cdlib.org/lbnl/LBNL-57493. (April 26), repositories.cdlib.org/lbnl/LBNL-57493 (April 27), repositories.cdlib.org/lbnl/LBNL-57493. and D. Zuras [1988]. "Integer multiplication and division on the HP precision architecture," IEEE Trans. on Computers 37:8, 980-990.

Mahlke, S. A., W. Y. Chen, W.-M. Hwu, B. R. Rau, and M. S. Schlansker [1992]. "Sentinel scheduling for VLIW and superscalar processors," Proc. Fifth Int'l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 12-15, 1992, Boston, 238-247. Mahlke, S.

M., A. E. Mercias, J. D. McCalpin, R.

A., R. E. Hank, J. E. McCormick, D. I. August, and W. W. Hwu [1995]. "A comparison of full and partial predicated execution support for ILP processors," Proc. 22nd Annual Int'l. Symposium on Computer Architecture (ISCA), June 22-24, 1995, Santa Margherita, Italy, 138-149. Major, J. B.

[1989]. "Are queuing models within the grasp of the unwashed?," Proc. Int'l. Conf. on Management and Performance Evaluation of elementary functions on the IBM RISC System/6000 processor," IBM J. Research and Development 34:1, 111-119. Mathis, H.

J. Eickemeyer, and S. R. Kunkel [2005]. "Characterization of the multithreading (SMT) efficiency in Power5," IBM J. Research and Development, 49:4/5 (July/September), 555-564. McCalpin, J. [2005]. "STREAM: Sustainable Memory Bandwidth in High Performance Computers," www.cs.virginia.edu/stream/.McCalpin, J., D. Bailey, and D. Takahashi Introduction to the HPC Challenge Benchmark Suite, Paper LBNL-57493 Lawrence Berkeley, Vational Laboratory, University of California, Berkeley, repositories.cdlib.org/lbnl/LBNL-57493.McCormick, J., and A. Knies [2002]. "A brief analysis of the SPEC CPU2000 benchmarks on the Intel Itanium 2 processor," paper presented at Hot Chips 14, August 18-20, 2002, Stanford University, Palo Alto, Calif.McFarling, S. [1989]. "Program optimization for instruction caches," Proc. Third Int'l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), April 3-6, 1989, Boston, 183-191. McFarling, S., [1993]. Combining Branch Predictors, WRL Technical Note TN-36, Digital Western Research Laboratory, Palo Alto, Calif.McFarling, S., and J. Hennessy [1986]. "Reducing the cost of branches," Proc. 13th

Annual Int'l. Symposium on Computer Architecture (ISCA), June 2-5, 1986, Tokyo, 396-403. McGhan, H., and M. O'Connor [1998]. "PicoJava: A direct execution engine for Java bytecode," Computer Conf., November 14-16, 1967, Washington, D.C., 413-417. McMahon, F. M. [1986]. "The Livermore FORTRAN Kernels: A Computer Test of Numerical Performance Range," Tech. Rep. UCRL-55745, Lawrence Livermore National Laboratory, University of California, Livermore National Laboratory, University [1980]. Introduction to VLSI Systems, Addison-Wesley, Reading, Mass. Mellor-Crummey, J.

M., and M. L. Scott [1991]. "Algorithms for scalable synchronization on shared-memory multiprocessors," ACM Trans. on Computer Systems 9:1 (February), 21-65. Menabrea, L. F. [1842]. "Sketch of the analytical engine invented by Charles Babbage," Bibliothèque Universelle de Genève, 82 (October). Menon, A., J. Renato Santos, Y. Turner, G. Janakiraman, and W. Zwaenepoel [2005]. "Diagnosing performance overheads in the xen virtual machine environment," Proc. First ACM/USENIX Int'l. Conf. on Virtual Execution Environments, June 11-12, 2005, Chicago, 13-23. Merlin, P. M., and P. J. Schweitzer [1980]. "Deadlock avoidance in store-and-forward networks. Part I. Store-and-forward deadlock," IEEE Trans. on Communications COM-28:3 (March), 345-354.Metcalfe, R.

M. [1993]. "Computer/network interface design: Lessons from Arpanet and Ethernet," IEEE J. on Selected Areas in Communications 11:2 (February), 173-180. Metcalfe, R. M., and D. R. Boggs [1976]. "Ethernet: Distributed packet switching for local computer networks," Communications of the ACM 19:7 (July), 395-404. Metropolis, N., J. Howlett, and G. C. Rota (eds.) [1980]. A History of Computing in the Twentieth Century, Academic Press, New York. Meyer, R. A., and L. H. Seawright [1970]. A virtual machine time sharing system, IBM Systems J. 9:3, 199-218. Meyers, G. J. [1978]. "The evaluation of expressions in a storage-to-storage architecture," Computer Architecture News 7:3 (October), 20-23. Meyers, G. J. [1982]. Advances in Computer Architecture, 2nd ed., Wiley, New York. Micron. [2004]. "Calculating Memory System Power for DDR2," . micron.com/pdf/pubs/designline/dl1Q04.pdf. Micron. [2004]. "Calculating Memory System Power for DDR2," . micron.com/pdf/pubs/designline/dl1Q04.pdf. Micron. [2004]. "Calculating Memory System Power for DDR2," . micron.com/pdf/pubs/designline/dl1Q04.pdf. Micron. [2004]. "Calculating Memory System Power for DDR2," . micron.com/pdf/pubs/designline/dl1Q04.pdf. Micron. [2004]. "Calculating Memory System Power for DDR2," . micron.com/pdf/pubs/designline/dl1Q04.pdf. Micron. [2004]. "Calculating Memory System Power for DDR2," . micron.com/pdf/pubs/designline/dl1Q04.pdf. Micron. [2004]. "Calculating Memory System Power for DDR2," . micron.com/pdf/pubs/designline/dl1Q04.pdf. Micron. [2004]. "Calculating Memory System Power for DDR2," . micron.com/pdf/pubs/designline/dl1Q04.pdf. Micron. [2004]. "Calculating Memory System Power for DDR2," . micron.com/pdf/pubs/designline/dl1Q04.pdf. Micron. [2004]. "Calculating Memory System Power for DDR2," . micron.com/pdf/pubs/designline/dl1Q04.pdf. Micron. [2004]. "Calculating Memory System Power for DDR2," . micron.com/pdf/pubs/designline/dl1Q04.pdf. Micron. [2004]. "Calculating Memory System Power for DDR2," . micron.com/pdf/pubs/designline/dl1Q04.pdf. Micron. [2004]. "Calculating Memory System Power for DDR2," . micron.com/pdf/pubs/designline/dl1Q04.pdf. Micron. [2004]. "Calculating Memory System Power for DDR2," . micron.com/pdf/pubs/designline/dl1Q04.pdf. Micron. [2004]. "Calculating Memory System Power for DDR2," . micron.com/pdf/pubs/designline/dl1Q04.pdf. Micron. [2004]. "Calculating Memory System Power for DDR2," . micron.com/pdf/pubs/designline/dl1Q04.pdf. Micron. [2004]. "Calculating Memory System Power for DDR2," . micron.com/pdf/pubs/designline/dl1Q04.pdf. Micron. [2004]. "Calculating Memory System Power for DDR2," . micron.com/pdf/pubs/designline/dl1Q04.pdf. Micron.com/pdf/pubs/designline/dl1Q04.pdf. Micron.com/pdf/pubs/design

www.sgi.com/MIPS/arch/MIPS16/mips16.pdf.Miranker, G. S., J. Rubenstein, and J. Sanguinetti [1988]. "Squeezing a Cray-class supercomputer into a single-user package," Proc. IEEE COMPCON, February 29-March 4, 1988, San Francisco, 452-456. Mitsubishi, Cypress, Calif. Mitsubishi, 1989]. "The Transputer: The time is now," Computer M32R Family Software Manual, Mitsubishi, Cypress, Calif. Miura, K., and K. Uchida [1983]. "FACOM vector processing system: VP100/200," Proc. NATO Advanced Research Workshop on High-Speed Computing, June 20-22, 1983, Jülich, West Germany. Also appears in K. Hwang, ed., "Superprocessors: Design and applications," IEEE (August 1984), 59-73. Miya, E. N. [1985]. "Multiprocessor/distributed processing system: VP100/200," Proc. NATO Advanced Research Workshop on High-Speed Computing, June 20-22, 1983, Jülich, West Germany.

bibliography, "Computer Architecture News 13:1, 27-29. Montoye, R. K., E. Hokenek, and S. L. Runyon [1990]. "Design of the IBM RISC System/6000 floating-point execution," IBM J. Research and Development 34:1, 59-70. Moore, B., A. Padegs, R. Smith, and W. Bucholz [1987]. "Concepts of the System/370 vector architecture," 14th Annual Int'l. Symposium on Computer Architecture (ISCA), June 2-5, 1987, Pittsburgh, Penn., 282-292. Moore, G. E. [1965]. "Cramming more components onto integrated circuits," Electronics, 38:8 (April 19), 114-117. Morse, S., B. Ravenal, S. Mazor, and W. Pohlman [1980]. "Intel microprocessors--8080 to 8086," Computer 13:10 (October). Moshovos, A., and G. S. Sohi [1997]. "Streamlining inter-operation memory communication via data dependence prediction,"

Proc. 30th Annual Int'l. Symposium on Microarchitecture, December 1-3, Research Triangle Park, N.C., 235-245. Moshovos, A., S. Breach, T. N. Vijaykumar, and G. S. Sohi [1997]. "Dynamic speculation and synchronization of data dependences," 24th Annual Int'l. Symposium on Computer Architecture (ISCA), June 2-4, 1997, Denver, Colo. Moussouris, J., L. Crudele, D. Freitas, C. Hansen, E. Hudson, S. Przybylski, T. Riordan, and C. Rowen [1986]. "A CMOS RISC processor with integrated system functions," Proc.

IEEE COMPCON, March 3-6, 1986, San Francisco, 191. Mowry, T. C., S. Lam, and A. Gupta [1992]. "Design and evaluation of a compiler algorithm for prefetching," Proc. Fifth Int'l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 12-15, 1992, Boston (SIGPLAN Notices 27:9 (September), 62-73). MSN Money. [2005]. "Amazon Shares Tumble after Rally Fizzles," .msn.com/content/CNBCTV/Articles/Dispatches/P133695.asp.Muchnick, S. S. [1988]. "Optimizing compilers for SPARC," Sun Technology 1:3 (Summer), 64-77.Mueller, M., L. C. Alves, W. Fischer, M. L. Fair, and I. Modi [1999]. "RAS strategy for IBM S/390 G5 and G6," IBM J. Research and Development 43:5-6 (September-November), 875-888.

Mukherjee, S. S., C. Weaver, J. S. Emer, S. K. Reinhardt, and T. M. Austin [2003]. "Measuring architectural vulnerability factors," IEEE Micro 23:6, 70-75. Murphy, B., and T. Gent [1995]. "Measuring system and software reliability using an automated data collection process," Quality and Reliability Engineering International 11:5 (September-October), 341-353.Myer, T. H., and I. E. Sutherland [1968]. "On the design of display processors," Communications of the ACM 11:6 (June), 410-414. Narayanan, D., E. Thereska, A. Donnelly, S. Elnikety, and A. Rowstron [2009]. "Migrating server storage to SSDs: Analysis of trade-offs," Proc. 4th ACM European Conf. on Computer Systems, April 1-3, 2009, Nuremberg, Germany. National Research Council. [1997]. The Evolution of Untethered Communications Board, National Academy Press, Washington, D.C. National Storage Industry Consortium. [1998]. "Tape Roadmap," www.nsic.org.Nelson, V.

P. [1990]. "Fault-tolerant computing: Fundamental concepts," Computer 23:7 (July), 19-25. Ngai, T.-F., and M. J. Irwin [1985]. "Regular, area-time efficient carry-lookahead adders," Proc. Seventh IEEE Symposium on Computer Arithmetic, June 4-6, 1985, University of Illinois, Urbana, 9-15. Nicolau, A., and J. A. Fisher [1984].

"Measuring the parallelism available for very long instruction word architectures," IEEE Trans. on Computers C-33:11 (November), 968-976. Nikhil, R. S., G. M. Papadopoulos, and Arvind [1992]. "\*T: A multithreaded massively parallel architecture," Proc. 19th Annual Int'l. Symposium on Computer Architecture (ISCA), May 19-21, 1992, Gold Coast, Australia, 156-167. Noordergraaf, L., and R. van der Pas [1999]. "Performance experiences on Sun's WildFire prototype," Proc. ACM/IEEE Conf. on Supercomputing, November 13-19, 1999, Portland, Ore. Nyberg, C. R., T. Barclay, Z. Cvetanovic, J. Gray, and D.

Lomet [1994]. "AlphaSort: A RISC machine sort," Proc. ACM SIGMOD, May 24-27, 1994, Minneapolis, Minn. Oka, M., and M. Suzuoki [1999]. "Designing and programming the emotion engine," IEEE Micro 19:6 (November-December), 20-28. Okada, Y. Matsuda, T. Yamada, and A. Kobayashi [1999]. "System on a chip for digital still camera," IEEE Trans. on Consumer Electronics 45:3 (August), 584-590. Oliker, L., A. Canning, J. Carter, J. Shalf, and S. Ethier [2004]. "Scientific computations on modern parallel vector systems," Proc. ACM/IEEE Conf. on Supercomputing, November 6-12, 2004, Pittsburgh, Penn., 10. Pabst, T. [2000]. "Performance Showdown at 133 MHz FSB--The Best Platform for Coppermine,"

www6.tomshardware.com/mainboard/00q1/000302/.Padua, D., and M. Wolfe [1986]. "Advanced compiler optimizations for supercomputers," Communications of the ACM 29:12 (December), 1184-1201. Palacharla, S., and R.

E. Kessler [1994]. "Evaluating stream buffers as a secondary cache replacement," Proc. 21st Annual Int'l. Symposium on Computer Architecture (ISCA), April 18-21, 1994, Chicago, 24-33. Palmer, J., and S. Morse [1984]. The 8087 Primer, John Wiley & Sons, New York, 93.Pan, S.-T., K. So, and J.

T. Rameh [1992]. "Improving the accuracy of dynamic branch prediction using branch correlation," Proc. Fifth Int'l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 12-15, 1992, Boston, 76-84. Partridge, C. [1994]. Gigabit Networking, Addison-Wesley, Reading, Mass. Patterson, D. [1985]. "Reduced instruction set computers," Communications of the ACM 28:1 (January), 8-21. Patterson, D. [2004]. "Latency lags bandwidth," Communications of the ACM 47:10 (October), 71-75. Patterson, D.

Ditzel [1980]. "The case for the reduced instruction set computer," Computer Architecture News 8:6 (October), 25-33. Patterson, D. A., and J. L. Hennessy [2004]. Computer Organization and Design: The Hardware/Software Interface. 3rd ed., Morgan Kaufmann, San Francisco, Patterson, D. A., G. A. Gibson, and R. H. Katz [1987]. A Case for Redundant Arrays of Inexpensive Disks (RAID), Tech. Rep. UCB/CSD 87/391, University of California, Berkeley. Also appeared in Proc. ACM SIGMOD, June 1-3, 1988, Chicago, 109-116. Patterson, D.

A., P. Garrison, M. Hill, D. Lioupis, C. Nyberg, T. Sippel, and K. Van Dyke [1983]. "Architecture of a VLSI instruction cache for a RISC," 10th Annual Int'l. Conf. on Computer Architecture Conf. Proc., June 13-16, 1983, Stockholm, Sweden, 108-116. Pavan, P., R. Bez, P. Olivo, and E. Zanoni [1997]. "Flash memory cells--an overview." Proc. IEEE 85:8 (August), 1248-1271.Peh, L. S., and W. J. Dally [2001]. "A delay model and speculative architecture for pipelined routers," Proc. 7th Int'l. Symposium on High-Performance Computer Architecture, January 22-24, 2001, Monterrey, Mexico. Peng, V., S. Samudrala, and M. Gavrielov [1987]. "On the implementation of shifters, multipliers, and dividers in

VLSI floating point units," Proc. 8th IEEE Symposium on Computer Arithmetic, May 19-21, 1987, Como, Italy, 95-102. Pfister, G. F. [1998]. In Search of Clusters, 2nd ed., Prentice Hall, Upper Saddle River, N. J. Pfister, G. F., W. C. Brantley, D. A. George, S. L. Harvey, W. J. Kleinfekder, K. P. McAuliffe, E. A.

Melton, V. A. Norton, and J. Weiss [1985]. "The IBM research parallel processor prototype (RP3): Introduction and architecture," Proc. 12th Annual Int'l. Symposium on Computer Architecture (ISCA), June 17-19, 1985, Boston, Mass., 764-771. Pinheiro, E., W. D. Weber, and L.

Barroso [2007]. "Failure trends in a large disk drive population," Proc. 5th USENIX Conference on File and Storage Technologies (FAST '07), February 13-16, 2007, San Jose, Calif. Pinkston, T. M. [2004]. "Deadlock characterization and resolution in interconnection networks," in M. C. Zhu and M. P.

Fanti, eds., Deadlock Resolution in Computer-Integrated Systems, CRC Press, Boca Raton, FL, 445-492. Pinkston, T. M., and J. Shin [2005]. "Trends toward on-chip networked microsystems," Int'l. J. of High Performance Computing and Networking 3:1, 3-18. Pinkston, T. M., and S. Warnakulasuriya [1997]. "On deadlocks in interconnection networks," 24th Annual Int'l. Symposium on Computer Architecture (ISCA), June 2-4, 1997, Denver, Colo. Pinkston, T. M., A. Benner, M. Krause, I. Robinson, and T.

Sterling [2003], "InfiniBand: The 'de facto' future standard for system and local area networks or just a scalable replacement for PCI buses?" Cluster Computing (special issue on communication architecture for clusters) 6:2 (April), 95-104. Postiff, M. A., D. A. Greene, G. S. Tyson, and T. N. Mudge [1999]. "The limits of instruction level parallelism in SPEC95 applications," Computer Architecture News 27:1 (March), 31-40. Przybylski, S. A. [1990]. Cache Design: A Performance-Directed Approach, Morgan Kaufmann, San Francisco. Przybylski, S. A., M. Horowitz, and J. L. Hennessy [1988]. "Performance trade-offs in cache design," 15th Annual Int'l. Symposium on Computer Architecture, May 30-June 2, 1988, Honolulu, Hawaii, 290-298. Puente, V., R. Beivide, J. A. Gregorio, J. M. Prellezo, J.

"The 801 minicomputer," Proc. Symposium Architectural Support for Programming Languages and Operating Systems (ASPLOS), March 1-3, 1982, Palo Alto, Calif., 39-47. Rajesh Bordawekar, Uday Bondhugula, Ravi Rao: Believe it or not!: mult-core CPUs can match GPU performance for a FLOP-intensive application! 19th International Conference on Parallel Architecture and Compilation Techniques (PACT 2010), Vienna, Austria, September 11-15, 2010: 537-538. Ramamoorthy, C. V., and H. F. Li [1977]. "Pipeline architecture," ACM Computing Surveys 9:1 (March), 61-102. Ranganathan, P., P. Leech, D. Irwin, and J. Chase [2006]. "Ensemble-Level Power Management for Dense Blade Servers," Proc. 33rd Annual Int'l. Symposium on Computer Architecture (ISCA), June 17-21, 2006, Boston, Mass., 66-77. Rau, B. R. [1994].

"Iterative modulo scheduling: An algorithm for software pipelining loops," Proc. 27th Annual Int'l. Symposium on Microarchitecture, November 30-December 2, 1994, San Jose, Calif., 63-74. Rau, B. R., C. D. Glaeser, and R. L. Picard [1982]. "Efficient code generation for horizontal architectures: Compiler techniques and architecture (ISCA), April 26-29, 1982, Austin, Tex., 131-139. Rau, B. R., D. W. L. Yen, W. Yen, and R. A. Towle [1989]. "The Cydra 5 departmental

supercomputer: Design philosophies, decisions, and trade-offs," IEEE Computers 22:1 (January), 12-34. Reddi, V. J., B. C. Lee, T. Chilimbi, and K. Vaid [2010]. "Web search using mobile cores: Quantifying and mitigating the price of efficiency," Proc. 37th Annual Int'l. Symposium on Computer Architecture (ISCA), June 19-23, 2010, Saint-Malo, France. Redmond, K. C., and T. M. Smith [1980]. Project Whirlwind--The History of a Pioneer Computer, Digital Press, Boston. Reinhardt, S. K., J. R. Larus, and D. A. Wood [1994]. "Tempest and Typhoon: User-level shared memory," 21st Annual Int'l. Symposium on Computer Architecture (ISCA), April 18-21, 1994, Chicago, 325-336. Reinman, G., and N. P. Jouppi. [1999]. "Extensions to CACTI," research.compaq.com/wrl/people/jouppi/CACTI.html.Rettberg, R. D., W. R. Crowther, P. P. Carvey, and R. S. Towlinson [1990]. "The Monarch parallel processor hardware design," IEEE Computer 23:4 (April), 18-30. Riemens, A., K. A. Vissers, R. J. Schutten, F. W. Sijstermans, G. J. Hekstra, and G. D. La Hei [1999]. "Trimedia CPU64 application domain and benchmark suite," Proc. IEEE Int'l. Conf. on Computer Design: VLSI in Computers and Processors (ICCD'99), October 10-13, 1999, Austin, Tex., 580-585. Riseman, E. M., and C. C. Foster [1972]. "Percolation of code to enhance parallel dispatching and execution," IEEE Trans. on Computers C-21:12 (December), 1411-1415. Robin, J., and C. Irvine [2000]. "Analysis of the Intel Pentium's ability to support a secure virtual machine monitor." Proc. USENIX Security Symposium, August 14-17, 2000, Denver, Colo. Robinson, B., and L. Blount [1986]. The VM/HPO 3880-23 Performance Results, IBM Tech. Bulletin GG66-0247-00, IBM Washington

Gupta [1995]. "Complete computer simulation: The SimOS approach," in IEEE Parallel and Distributed Technology (now called Concurrency ) 4:3, 34-43. Rowen, C., M. Johnson, and P. Ries [1988]. "The MIPS R3010 floating-point coprocessor," IEEE Micro 8:3 (June), 53-62. Russell, R. M. [1978]. "The Cray-1 processor system," Communications of the ACM 21:1 (January), 63-72. Rymarczyk, J. [1982]. "Coding guidelines for pipelined processors," Proc. Symposium Architectural Support for Programming Languages and Operating Systems (ASPLOS), March 1-3, 1982, Palo Alto, Calif., 12-19. Saavedra-Barrera, R. H. [1992]. "CPU Performance Evaluation and Execution Time Prediction Using Narrow Spectrum Benchmarking," Ph. D. dissertation, University of California, Berkeley. Salem, K., and H. Garcia-Molina [1986]. "Disk striping," Proc. 2nd Int'l. IEEE Conf. on Data Engineering, February 5-7, 1986, Washington, D.C., 249-259. Saltzer, J. H., D. P. Reed, and D. D. Clark [1984]. "End-to-end arguments in system design," ACM Trans. on Computer Systems 2:4 (November), 277-288. Samples,

Systems Center, Gaithersburg, Md.Ropers, A., H. W. Lollman, and J. Wellhausen [1999]. DSPstone: Texas Instruments TMS320C54x, Tech. Rep. IB 315 1999/9-ISS-Version 0.9, Aachen University of Technology, Aaachen, Germany (www.ert.rwth-aachen.de/Projekte/Tools/coal/dspstone c54x/index.html).Rosenblum, M., S. A. Herrod, E. Witchel, and A.

A. D., and P. N. Hilfinger [1988]. Code Reorganization for Instruction Caches, Tech. Rep. UCB/CSD 88/447, University of California, Berkeley. Santoro, M. R., G. Bewick, and M. A. Horowitz [1989]. "Rounding algorithms for IEEE multipliers," Proc. Ninth IEEE Symposium on Computer Arithmetic, September 6-8, Santa Monica, Calif., 176-183. Satran, J., D. Smith, K. Meth, C. Sapuntzakis, M. Wakeley, P. Von Stamwitz, R. Haagens, E. Zeidner, L. Dalle Ore, and Y. Klein [2001]. "ISCSI," IPS Working Group of IETF, Internet drafts www.ietf.org/internet-drafts/draft-ietf-ips-iscsi-07.txt. Saulsbury, A., T. Wilkinson, J. Carter, and A. Landin [1995]. "An argument for Simple COMA," Proc. First IEEE Symposium on High-Performance Computer Architectures, January 22-25, 1995, Raleigh, N.C., 276-285. Schneck, P. B. [1987]. Superprocessor Architecture, Kluwer Academic Publishers, Norwell, Mass.Schroeder, B., and G. A. Gibson [2007]. "Understanding failures in petascale computers," J. of Physics Conf. Series 78(1), 188-198. Schroeder, B., E. Pinheiro, and W.-D. Weber [2009]. "DRAM errors in the wild: a largescale field study," Proc. Eleventh Int'l. Joint Conf. on

Measurement and Modeling of Computer Systems (SIGMETRICS), June 15-19, 2009, Seattle, Wash. Schurman, E., and J. Brutlag [2009]. "The user and business impact of server delays," Proc. Velocity: Web Performance and Operations Conf., June 22-24, 2009, San Jose, Calif. Schwartz, J. T. [1980]. "Ultracomputers," ACM Trans. on Programming Languages and Systems 4:2, 484-521. Scott, N. R. [1985]. Computer Number Systems and Arithmetic, Prentice Hall, Englewood Cliffs, N. J. Scott, S. L. [1996]. "Synchronization and communication in the T3E multiprocessor," Seventh Int'l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 1-5, 1996, Cambridge, Mass. Scott, S. L., and J. Goodman [1994]. "The impact of pipelined channels on k -ary n -cube networks," IEEE Trans.

Thompson, and D. W. Hunter [1983]. The Access Time Myth, "Tech. Rep. RC 10197 (45223), IBM, Yorktown Heights, N.Y.Seagate. [2000]. Seagate Cheetah 73 Family: ST173404LW/LWV/LC/LCV Product Manual, Vol. 1, Seagate Cheetah 73 Family: ST173404LW/LWV/LC/LCV Product Manual, Vol. 1, Seagate Cheetah 73 Family: ST173404LW/LWV/LC/LCV Product Manual, Vol. 1, Seagate Cheetah 73 Family: ST173404LW/LWV/LC/LCV Product Manual, Vol. 1, Seagate Cheetah 73 Family: ST173404LW/LWV/LC/LCV Product Manual, Vol. 1, Seagate Cheetah 73 Family: ST173404LW/LWV/LC/LCV Product Manual, Vol. 1, Seagate Cheetah 73 Family: ST173404LW/LWV/LC/LCV Product Manual, Vol. 1, Seagate Cheetah 73 Family: ST173404LW/LWV/LC/LCV Product Manual, Vol. 1, Seagate Cheetah 73 Family: ST173404LW/LWV/LC/LCV Product Manual, Vol. 1, Seagate Cheetah 73 Family: ST173404LW/LWV/LC/LCV Product Manual, Vol. 1, Seagate Cheetah 73 Family: ST173404LW/LWV/LC/LCV Product Manual, Vol. 1, Seagate Cheetah 73 Family: ST173404LW/LWV/LC/LCV Product Manual, Vol. 1, Seagate Cheetah 73 Family: ST173404LW/LWV/LC/LCV Product Manual, Vol. 1, Seagate Cheetah 73 Family: ST173404LW/LWV/LC/LCV Product Manual, Vol. 1, Seagate Cheetah 73 Family: ST173404LW/LWV/LC/LCV Product Manual, Vol. 1, Seagate Cheetah 73 Family: ST173404LW/LWV/LC/LCV Product Manual, Vol. 1, Seagate Cheetah 73 Family: ST173404LW/LWV/LC/LCV Product Manual, Vol. 1, Seagate Cheetah 73 Family: ST173404LW/LWV/LC/LCV Product Manual, Vol. 1, Seagate Cheetah 73 Family: ST173404LW/LWV/LC/LCV Product Manual, Vol. 1, Seagate Cheetah 73 Family: ST173404LW/LWV/LC/LCV Product Manual, Vol. 1, Seagate Cheetah 73 Family: ST173404LW/LWV/LC/LCV Product Manual, Vol. 1, Seagate Cheetah 73 Family: ST173404LW/LWV/LC/LCV Product Manual, Vol. 1, Seagate Cheetah 73 Family: ST173404LW/LWV/LC/LCV Product Manual, Vol. 1, Seagate Cheetah 73 Family: ST173404LW/LC/LCV Product Manual, Vol. 1, Seagate Cheetah 73 Family: ST173404LW/LC/LCV Product Manual, Vol. 1, Seagate Cheetah 73 Family: ST173404LW/LC/LCV Product Manual, Vol. 1, Seagate Cheetah 73 Family: ST

(concurrent computing)," Communications of the ACM 28:1 (January), 22-33. Senior, J. M. [1993].

Optical Fiber Commmunications: Principles and Practice, 2nd ed., Prentice Hall, Hertfordshire, U. K. Sharangpani, H., and K. Arora [2000]. "Itanium Processor Microarchitecture," IEEE Micro 20:5 (September-October), 24-43. Shurkin, J [1984]. Engines of the Mind: A History of the Computer, W. W. Norton, New York.

Duato, and C. Izu [1999]. "Adaptive bubble router: A design to improve performance in torus networks," Proc. 28th Int'l. Conference on Parallel Processing, September 21-24, 1999, Aizu-Wakamatsu, Fukushima, Japan. Radin, G. [1982].

Shustek, L. J. [1978]. "Analysis and Performance of Computer Instruction Sets," Ph. D. dissertation, Stanford University, Palo Alto, Calif. Silicon Graphics. [1996]. MIPS V Instruction Set (see .Singh, J. P., J. L. Hennessy, and A. Gupta [1993].

"Scaling parallel programs for multiprocessors: Methodology and examples," Computer 26:7 (July), 22-33. Sinharoy, B., R. N. Koala, J. M. Tendler, R. J. Eickemeyer, and J. B. Joyner [2005]. "POWER5 system microarchitecture," IBM J. Research and Development, 49:4-5, 505-521. Sites, R. [1979]. Instruction Ordering for the CRAY-1 Computer, Tech. Rep. 78-CS-023, Dept. of Computer Science, University of California, San Diego. Sites, R. L., and R. Witek, (eds.) [1995]. Alpha Architecture Reference Manual, 2nd ed., Digital Press, Burlington, Mass. Sites, R. L., and R. Witek, (eds.) [1995]. Alpha Architecture Reference Manual, 2nd ed., Digital Press, Burlington, Mass. Sites, R. L., and R. Witek, (eds.) [1995].

on Parallel and Distributed Systems 5:1 (January), 1-16. Scott, S. L., and G. M. Thorson [1996]. "The Cray T3E network: Adaptive routing in a high performance 3D torus," Proc. IEEE HOT Interconnects '96, August 15-17, 1996, Stanford University, Palo Alto, Calif., 14-156. Scranton, R. A., D. A.

Newton, Mass. Skadron, K., and D. W. Clark [1997]. "Design issues and tradeoffs for write buffers," Proc. Third Int'l. Symposium on High-Performance Computer Architecture, February 1-5, 1997, San Antonio, Tex., 144-155. Skadron, K., P. S. Ahuja, M. Martonosi, and D. W. Clark [1999]. "Branch prediction, instruction-window size, and cache size: Performance tradeoffs and simulation techniques," IEEE Trans. on Computers 48:11 (November). Slater, R. [1987]. "The Solomon computer," Proc. AFIPS Fall Joint

Computer Conf., December 4-6, 1962, Philadelphia, Penn., 97-107. Smith, A. J. [1982]. "Cache memories," Computer 17:1 (January), 6-22. Smith, B. J. [1978]. "A pipelined, shared resource MIMD computer," Proc. Int'l. Conf. on Parallel Processing (ICPP), August, Bellaire, Mich., 6-8. Smith, B. J. [1981]. "Architecture and applications of the HEP multiprocessor system," Real-Time Signal Processing IV 298 (August), 241-248. Smith, J. E. [1981]. "A study of branch prediction strategies," Proc. Eighth Annual Int'l. Symposium on Computer Architecture (ISCA), May 12-14,

1981, Minneapolis, Minn., 135-148. Smith, J. E. [1984]. "Decoupled access/execute computer architectures," ACM Trans. on Computer Systems 2:4 (November), 289-308. Smith, J. E. [1988]. "Characterizing computer performance with a single number," Communications of the ACM 31:10 (October), 1202-1206. Smith, J. E., and J. R. Goodman [1983]. "A study of instruction cache organizations and replacement policies," Proc. 10th Annual Int'l. Symposium on Computer Architecture (ISCA), June 5-7, 1982, Stockholm, Sweden, 132-137. Smith, J. E., and A. R. Pleszkun [1988]. "Implementing precise interrupts in pipelined processors," IEEE Trans. on Computers 37:5 (May), 562-573.

(This paper is based on an earlier paper that appeared in Proc. 12th Annual Int'l. Symposium on Computer Architecture (ISCA), June 17-19, 1985, Boston, Mass.) Smith, J. E., G. E. Dermer, B. D. Vanderwarn, S. D. Klinger, C. M. Rozewski, D. L. Fowler, K. R. Scidmore, and J. P. Laudon [1987]. "The ZS-1 central processor," Proc. Second Int'l. Conf.

on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 5-8, 1987, Palo Alto, Calif., 199-204. Smith, M. D., M. Horowitz, and M. S. Lam [1992]. "Efficient superscalar performance through boosting," Proc. Fifth Int'l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 12-15, 1992, Boston, 248-259.

"A computer oriented towards spatial problems," Proc. Institute of Radio Engineers 46:10 (October), 1744-1750. Vahdat, A., M. Al-Fares, N. Farrington, R.

Smith, M. D., M. Johnson, and M. A. Horowitz [1989]. "Limits on multiple instruction issue," Proc.

Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), April 3-6, 1989, Boston, 290-302. Smotherman, M. [1989]. "A sequencing-based taxonomy of I/O systems and review of historical machines," Computer Architecture News 17:5 (September), 5-15. Reprinted in Computer Architecture Readings, M. D. Hill, P. Jouppi, and G. S. Sohi, eds., Morgan Kaufmann, San Francisco, 1999, 451-461. Sodani, A., and G. Sohi [1997].

"Dynamic instruction reuse," Proc. 24th Annual Int'l. Symposium on Computer Architecture (ISCA), June 2-4, 1997, Denver, Colo. Sohi, G. S. [1990]. "Instruction issue logic for high-performance, interruptible, multiple functional unit, pipelined computers," IEEE Trans. on Computers 39:3 (March), 349-359. Sohi, G. S., and S. Vajapeyam [1989]. "Tradeoffs in instruction format design for horizontal architectures," Proc. Third Int'l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), April 3-6, 1989, Boston, 15-25. Soundararajan, V., M. Heinrich, B. Verghese, K. Gharachorloo, A.

Gupta, and J. L. Hennessy [1998]. "Flexible use of memory for replication/migration in cachecoherent DSM multiprocessors," Proc.

25th Annual Int'l. Symposium on Computer Architecture (ISCA). July 3-14, 1998, Barcelona, Spain, 342-355, SPEC, [1989], SPEC Benchmark Suite Release 1.0 (October 2). SPEC, [1989], SPEC Benchmark Suite Release 1.0 (October 2). SPEC, [1989], SPEC Benchmark Suite Release 1.0 (October 2). SPEC, [1989], SPEC, [198 COMPCON, February 29-March 4, 1988, San Francisco, 464. Spurgeon, C. [2001], "Charles Spurgeon, C. [2006], "Charles Spurgeon, performance evaluation of cache-coherent NUMA and COMA architectures, "Proc. 19th Annual Int'l. Symposium on Computer Architecture (ISCA), May 19-21, 1992, Gold Coast, Australia, 80-91. Sterling, T. [2001]. Beowulf PC Cluster Computing with Linux, MIT Press, Cambridge, Mass. Stern, N. [1980]. "Who invented the first electronic digital computer?" Annals of the History of Computing 2:4 (October), 375-376. Stevens, W. R. [1994-1996]. TCP/IP Illustrated (three volumes), Addison-Wesley, Reading, Mass. Stokes, J. [2000]. "Sound and Vision: A Technical Overview of the Emotion Engine," arstechnica.com/reviews/1q00/playstation2/ee-1.html.Stone, H. [1991]. High Performance Computers, Addison-Wesley, New York.Strauss, W. [1998]. "DSP Strategies 2002," www.usadata.com/market\_research/spr\_05/spr\_r127-005.htm.Strecker, W. D. [1976]. "Cache memories for the PDP-11?," Proc. Third Annual Int'l. Symposium on Computer Architecture (ISCA), January 19-21, 1976, Tampa, Fla., 155-158. Strecker, W. D. [1978]. "VAX-11/780: A virtual address extension of the PDP-11 family," Proc. AFIPS National Computer Conf., June 5-8, 1978, Anaheim, Calif., 47, 967-980. Sugumar, R. A., and S. G. Abraham [1993]. "Efficient simulation of caches under optimal replacement with applications to miss characterization," Proc. ACM

on Measurement and Modeling of Computer Systems, May 17-21, 1993, Santa Clara, Calif., 24-35. Sun Microsystems, Santa Clara, Calif.Sussenguth, E. [1999]. "IBM's ACS-1 Machine," IEEE Computer 22:11 (November). Swan, R. J., S. H. Fuller, and D. P. Siewiorek [1977]. "Cm\*--a modular, multimicroprocessor," Proc. AFIPS National Computing Conf., June 13-16, 1977, Dallas, Tex., 637-644. Swan, R. J., A. Bechtolsheim, K. W. Lai, and J. K. Ousterhout [1977]. "The implementation of the Cm\* multi-microprocessor," Proc. AFIPS National Computing Conf , June 13-16, 1977, Dallas, Tex., 645-654. Swartzlander, E. (ed.) [1990]. Computer Arithmetic, IEEE Computer Society Press, Los Alamitos, Calif. Takagi, N., H. Yasuura, and S. Yajima [1985]. "High-speed VLSI multiplication algorithm with a redundant binary addition tree," IEEE Trans. on Computers C-34:9, 789-796. Talagala, N. [2000].

"Characterizing Large Storage Systems: Error Behavior and Performance Benchmarks," Ph. D.

dissertation, Computer Science Division, University of California, Berkeley. Talagala, N., and D. Patterson [1999]. An Analysis of Error Behavior in a Large Storage System, Tech. Report UCB//CSD-99-1042, Computer Science Division, University of California, Berkeley. Talagala, N., R. Arpaci-Dusseau, and D. Patterson, R. Arpaci-Dusseau, and D. Patterson [2000]. Micro-Benchmark Based Extraction of Local and Global Disk Characteristics, CSD-99-1063, Computer Science Division, University of California, Berkeley. Talagala, N., S. Asami, D. Patterson, R.

Futernick, and D. Hart [2000]. "The art of massive storage: A case study of a Web image archive," Computer (November). Tamir, Y., and G. Frazier [1992]. "Dynamically-allocated multi-queue buffers for VLSI communication switches," IEEE Trans. on Computers 41:6 (June), 725-734. Tanenbaum, A. S. [1978]. "Implications of structured programming for machine architecture," Communications of the ACM 21:3 (March), 237-246. Tanenbaum, A. S. [1988]. Computer Networks, 2nd ed., Prentice Hall, Englewood Cliffs, N.

Tang, C. K. [1976]. "Cache design in the tightly coupled multiprocessor system," Proc. AFIPS National Computer Conf., June 7-10, 1976, New York, 749-753. Tangueray, D. [2002]. "The Cray X1 and supercomputer road map," Proc. 13th Daresbury Machine Evaluation Workshop, December 11-12, 2002, Daresbury Laboratories, Daresbury, Cheshire, U. K.Tarjan, D., S. Thoziyoor, and N. Jouppi [2005]. "HPL Technical Report on CACTI 4.0." www.hpl.hp.com/techeports/2006/HPL=2006+86.html.Taylor. G. S. [1981]. "Compatible hardware for division and square root." Proc.

5th IEEE Symposium on Computer Arithmetic, May 18-19, 1981, University of Michigan, Ann Arbor, Mich., 127-134. Taylor, G. S. [1985]. "Radix 16 SRT dividers with overlapped quotient selection stages," Proc.

Seventh IEEE Symposium on Computer Arithmetic, June 4-6, 1985, University of Illinois, Urbana, Ill., 64-71. Taylor, G., P. Hilfinger, J. Larus, D. Patterson, and B. Zorn [1986]. "Evaluation of the SPUR LISP architecture," Proc. 13th Annual Int'l. Symposium on Computer Arithmetic, June 2-5, 1986, Tokyo. Taylor, M. B., W. Lee, S. P. Amarasinghe, and A. Agarwal [2005]. "Scalar operand networks," IEEE Trans. on Parallel and Distributed Systems 16:2 (February), 145-162. Tendler, J. M., J. S. Dodson, J. S. Fields, Jr., H. Le, and B. Sinharoy [2002]. "Power4 system microarchitecture," IBM J.

Research and Development 46:1, 5-26. Texas Instruments. [2000]. "History of Innovation: 1980s," www.ti.com/corp/docs/company/history/1980s.shtml.Tezzaron Semiconductor, Naperville, Ill. (.Thacker, C. P., E. M. McCreight, B. W. Lampson, R. F. Sproull, and D. R. Boggs [1982]. "Alto: A personal computer," in D. P. Siewiorek, C. G. Bell, and A. Newell, eds., Computer Structures: Principles and Examples, McGraw-Hill, New York, 549-572. Thadhani, A. J. [1981]. "Interactive user productivity," IBM Systems J. 20:4, 407-423. Thekkath, R., A. P.

Singh, J. P. Singh, S. John. and J. L. Hennessv [1997]. "An evaluation of a commercial CC-NUMA architecture--the CONVEX Exemplar SPP1200," Proc. 11th Int'l. Parallel Processing Symposium (IPPS), April 1-7, 1997, Geneva, Switzerland.

Thorlin, J. F. [1967]. "Code generation for PIE (parallel instruction execution) computer Conf., April 18-20, 1964, San Francisco, 26, 33-40. Thornton, J. E. [1964]. "Parallel operation in the Control Data 6600," Proc. AFIPS Fall Joint Computer Conf., Part II, October 27-29, 1964, San Francisco, 26, 33-40. Thornton, J. E. [1970]. Design of a Computer, the Control Data 6600, Scott, Foresman, Glenview, Ill. Tjaden, G. S., and M. I. Flynn [1970], "Detection and parallel execution of independent instructions," IEEE Trans. on Computers C-19:10 (October), 889-895, Tomasulo, R. M. [1967], "An efficient algorithm for exploiting multiple arithmetic units," IBM I. Research and Development 11:1 (January), 25-33, Torrellas, L. A.

Gupta, and J. Hennessy [1992]. "Characterizing the caching and synchronization performance of a multiprocessor operating Systems (ASPLOS), October 12-15, 1992, Boston (SIGPLAN Notices 27:9 (September), 162-174). Touma, W. R. [1993]. The Dynamics of the Computer Industry: Modeling the Supply of Workstations and Their Components, Kluwer Academic, Boston. Tuck, N., and D. Tullsen [2003]. "Initial observations of the simultaneous multithreading Pentium 4 processor," Proc. 12th Int. Conf. on Parallel Architectures and Compilation Techniques (PACT'03), September 27-October 1, 2003, New Orleans, La., 26-34. Tullsen, D. M., S. J. Eggers, and H. M. Levy [1995]. "Simultaneous multithreading: Maximizing on-chip parallelism," Proc. 22nd Annual Int'l. Symposium on Computer Architecture (ISCA), June 22-24, 1995, Santa

Margherita, Italy, 392-403. Tullsen, D. M., S. J. Eggers, J. S. Emer, H. M. Levy, J. L. Lo, and R. L. Stamm [1996]. "Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor," Proc. 23rd Annual Int'l. Symposium on Computer Architecture (ISCA), May 22-24, 1996, Philadelphia, Penn., 191-202. Ungar, D., R. Blau, P. Foley, D. Samples, and D. Patterson [1984]. "Architecture of SOAR: Smalltalk on a RISC," Proc. 11th Annual Int'l. Symposium on Computer Architecture (ISCA), June 5-7, 1984, Ann Arbor, Mich., 188-197. Unger, S.

Niranjan Mysore, G. Porter, and S. Radhakrishnan [2010]. "Scale-Out Networking in the Data Center," IEEE Micro 30:4 (July/August), 29-41. Vaidya, A. S., A Sivasubramaniam, and C. R. Das [1997]. "Performance benefits of virtual channels and adaptive routing: An application-driven study," Proc. ACM/IEEE Conf. on Supercomputing, November 16-

21, 1997, San Jose, Calif. Vajapeyam, S. [1991]. "Instruction-Level Characterization of the Cray Y-MP Processor," Ph. D. thesis, Computer Sciences Department, University of Wisconsin-Madison. van Eijndhoven, J. T. J., F. W. Sijstermans, K. A. Vissers, E. J.

D. Pol, M. I. A. Tromp, P. Struik, R. H. J. Bloks, P. van der Wolf, A. D. Pimentel, and H. P. E. Vranken [1999]. "Trimedia CPU64 architecture," Proc. IEEE Int'l.

Symposium on Computer Architecture (ISCA), May 19-21, 1992, Gold Coast, Australia. Waingold, E., M. Taylor, D.

Conf. on Computer Design: VLSI in Computers and Processors (ICCD'99), October 10-13, 1999, Austin, Tex., 586-592. Van Vleck, T. [2005]. "The IBM 360/67 and CP/CMS," Eicken, T., D. E. Culler, S. C. Goldstein, and K. E. Schauser [1992]. "Active Messages: A mechanism for integrated communication and computation," Proc. 19th Annual Int'l.

Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, R. Barua, J

Software, 106:C, (19-32), Online publication date: 1-Apr-2017.Zhang Y, Hou J, Cao Y, Gu J and Huang C (2017).

Amarasinghe, and A. Agarwal [1997].

"Baring it all to software: Raw Machines," IEEE Computer 30 (September), 86-93. Wakerly, J. [1989]. Microcomputer Architecture and Programming, Wiley, New York. Wall, D. W. [1991]. "Limits of instruction-level parallelism," Proc. Fourth Int'l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), April 8-11, 1991, Palo Alto, Calif., 248-259. Wall, D. W. [1993]. Limits of Instruction-Level Parallelism, Research Rep. 93/6, Western Research Laboratory, Digital Equipment Corp., Palo Alto, Calif.Walrand, J. [1991]. Communication Networks: A First Course, Aksen Associates/Irwin, Homewood, Ill. Wang, W.-H., J.-L. Baer, and H. M. Levy [1989]. "Organization and performance of a two-level virtual-real cache hierarchy," Proc. 16th Annual Int'l. Symposium on Computer Architecture (ISCA), May 28-June 1, 1989, Jerusalem, 140-148. Watanabe, T. [1987]. "Architecture and performance of the NEC supercomputer SX system," Parallel Computing 5, 247-255. Waters, F. (ed.) [1986]. IBM RT Personal Computer Technology, SA 23-1057, IBM, Austin, Tex. Watson, W. J. [1972]. "The TI ASC--a highly modular and flexible super processor architecture," Proc. AFIPS Fall Joint Computer Conf., December 5-7, 1972, Anaheim, Calif., 221-228. Weaver, D. L., and T. Germond [1994]. "Dhrystone: A synthetic systems programming benchmark," Communications of

the ACM 27:10 (October), 1013-1030. Weiss, S., and J. E. Smith [1984]. "Instruction issue logic for pipelined supercomputers," Proc. 11th Annual Int'l. Symposium on Computer Architecture (ISCA), June 5-7, 1984, Ann Arbor, Mich., 110-118. Weiss, S., and J. E. Smith [1987]. "A study of scalar compilation techniques for pipelined supercomputers," Proc. Second Int'l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 5-8, 1987, Palo Alto, Calif., 105-109. Weiss, S., and J. E. Smith [1994]. Power and PowerPC, Morgan Kaufmann, San Francisco. Kalla, J. Friedrich, J. Kahle, J. Leenstra, C. Lichtenau, B. Sinharoy, W. Starke, and V. Zyuban [2010]. "The Power7 processor SoC," Proc. Int'l. Conf. on IC Design and Technology, June 2-4, 2010, Grenoble, France, 71-73. Weste, N., and K. Eshraghian [1993]. Principles of CMOS VLSI Design: A Systems Perspective, 2nd ed., Addison-Wesley, Reading, Reading, Proc. Int'l. Conf. on IC Design and Technology and Technology, Interest and V. Zyuban [2010]. "The Power7 processor SoC," Proc. Int'l. Conf. on IC Design and Technology and Te Mass. Wiecek, C. [1982]. "A case study of the VAX 11 instruction set usage for compiler execution," Proc. Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS), March 1-3, 1982, Palo Alto, Calif., 177-184. Wilkes, M. [1965]. "Slave memories and dynamic storage allocation," IEEE Trans. Electronic Computers EC-14:2 (April), 270-271. Wilkes, M. V. [1982]. "Hardware support for memory protection: Capability implementations," Proc. Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS), March 1-3, 1982,

V., D. J. Wheeler, and S. Gill [1951]. The Preparation of Programs for an Electronic Digital Computer, Addison-Wesley, Cambridge, Mass. Williams, S., A. Waterman, and D. Patterson [2009]. "Roofline: An insightful visual performance model for multicore architectures," Communications of the ACM, 52:4 (April), 65-76. Williams, T. E., M.

Horowitz R L Alverson and T S. Yang [1987]. "A self-timed chip for division," in P. Losleben, ed., 1987 Stanford Conference on Advanced Research in VLSI, MIT Press, Cambridge, Mass. Wilson, A. W., Jr. [1987].

Palo Alto, Calif., 107-116. Wilkes, M. V. [1985]. Memoirs of a Computer Pioneer, MIT Press, Cambridge, Mass. Wilkes, M. V. [1995]. Computing Perspectives, Morgan Kaufmann, San Francisco. Wilkes, M.

"Hierarchical cache/bus architecture for shared-memory multiprocessors," Proc. 14th Annual Int'l. Symposium on Computer Architecture (ISCA), June 2-5, 1987, Pittsburgh, Penn., 244-252. Wilson, R. P., and M. S. Lam [1995]. "Efficient context-sensitive pointer analysis for C programs," Proc. ACM SIGPLAN'95 Conf. on Programming Language Design and Implementation, June 18-21, 1995, La Jolla, Calif., 1-12. Wolfe, A., and J. P. Shen [1991]. "A variable instruction stream extension to the VLIW architecture," Proc. Fourth Int'l. Conf. on Architecture, "Proc. Fourth Int'l. Conf. on Architecture," Proc. Fourth Int'l. Conf. on Architecture, "Proc. Fourth Int'l. Conf. on Architecture," Proc. Fourth Int'l. Conf. on Architecture, "Proc. Fourth Int'l. Conf. on Architecture," Proc. Fourth Int'l. Conf. on Architecture, "Proc. Fourth Int'l. Conf. on Architecture," Proc. Fourth Int'l. Conf. on Architecture, "Proc. Fourth Int'l. Conf. on Architecture," Proc. Fourth Int'l. Conf. on Architecture, "Proc. Fourth Int'l. Conf. on Architecture," Proc. Fourth Int'l. Conf. on Architecture, "Proc. Fourth Int'l. Conf. on Architecture," Proc. Fourth Int'l. Conf. on Architecture, "Proc. Fourth Int'l. Conf. on Architecture," Proc. Fourth Int'l. Conf. on Architecture, "Proc. Fourth Int'l. Conf. on Architecture," Proc. Fourth Int'l. Conf. on Architecture, "Proc. Fourth Int'l. Conf. on Architecture," Proc. Fourth Int'l. Conf. on Architecture, "Proc. Fourth Int'l. Conf. on Architecture," Proc. Fourth Int'l. Conf. on Architecture, "Proc. Fourth Int'l. Conf. on Architecture," Proc. Fourth Int'l. Conf. on Architecture, "Proc. Fourth Int'l. Conf. on Architecture," Proc. Fourth Int'l. Conf. on Architecture, "Proc. Fourth Int'l. Conf. on Architecture," Proc. Fourth Int'l. Conf. on Architecture, "Proc. Fourth Int'l. Conf. on Architecture, "Proc. Fourth Int'l. Conf. on Architecture," Proc. Fourth Int'l. Conf. on Architecture, "Proc. Fourth Int'l. Conf. o Hill [1995]. "Cost-effective parallel computer 28:2 (February), 69-72. Wulf, W. [1981]. "Computer architecture," Computer Architecture, "Computer Conf., December 5-7, 1972, Anaheim, Calif., 765-777. Wulf, W., and S. P. Harbison [1978]. "Reflections in a pool of processors--an experience report on C.mmp/Hydra," Proc. AFIPS National Computing Conf. June 5-8, 1978, Anaheim, Calif., 939-951. Wulf, W. A., and S. A. McKee [1995]. "Hitting the memory wall: Implications of the obvious," ACM SIGARCH Computer Architecture News, 23:1 (March), 20-24. Wulf, W. A., R. Levin, and S. P.

Harbison [1981]. Hydra/C.mmp: An Experimental Computer System, McGraw-Hill, New York.Yamamoto, W., M. J. Serrano, A. R. Talcott, R. C. Wood, and M. Nemirosky [1994]. "Performance estimation of multistreamed, superscalar processors," Proc. 27th Annual Hawaii Int'l. Conf. on System Sciences, January 4-7, 1994, Maui, 195-204.Yang, Y., and G. Mason [1991]. "Nonblocking broadcast switching networks," IEEE Trans. on Computers 40:9 (September), 1005-1015. Yeager, K. [1996]. "The MIPS R10000 superscalar microprocessor," IEEE Micro 16:2 (April), 28-40. Yeh, T., and Y. N. Patt [1993a]. "Alternative implementations of two-level adaptive branch prediction," Proc. 19th Annual Int'l. Symposium on Computer Architecture (ISCA), May 19-21, 1993, Gold Coast, Australia, 124-134. Yeh, T., and Y. N. Patt [1993b]. "A comparison of dynamic branch predictors that use two levels of branch history," Proc. 20th Annual Int'l. Symposium on Computer Architecture (ISCA), May 16-19, 1993, San Diego, Calif., 257-266.

PISCOT: A Pipelined Split-Transaction COTS-Coherent Bus for Multi-Core Real-Time Systems, ACM Transactions on Embedded Computing Systems, 22:1, (1-27), Online publication date: 31-Jan-2023. Naghibijouybari H, Koruyeh E and Abu-Ghazaleh N (2022). Microarchitectural Attacks in Heterogeneous Systems: A Survey, ACM Computing Surveys, 55:7, (1-40), Online publication date: 31-Jul-2023.Kong L, Tan J, Huang J, Chen G, Wang S, Jin X, Zeng P, Khan M and Das S (2022). Edge-computing-driven Internet of Things: A Survey, ACM Computing Surveys, 55:8, (1-41), Online publication date: 31-Aug-2023.Xu M, Ng W, Lim W, Kang J, Xiong Z, Niyato D, Yang Q, Shen X and Miao C (2023). A Full Dive Into Realizing the Edge-Enabled Metaverse: Visions, Enabling Technologies, and Challenges, IEEE Communications Surveys & Tutorials, 25:1, (656-700), Online publication date: 1-Jan-2023. Araújo De Medeiros D, Markidis S and Bo Peng I LibCOS: Enabling Converged HPC and Cloud Data Stores with MPI Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, (106-116)Mhatre S and Chandran P On the Measurement of Performance Metrics for Virtualization-Enhanced Architectures Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing, (49-56)Friedman R, Goaz O and Hovav D PKache: A Generic Framework for Data Plane Caching Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing, (1268-1276)Resch S, Cilasun H, Chowdhury Z, Zabihi M, Zhao Z, Wang J, Sapatnekar S and Karpuzcu U On Endurance of Processing in (Nonvolatile) Memory Proceedings of the 50th Annual International Symposium on Computer Architecture, (1-13)Orts F, Ortega G, Combarro E, Rúa I, Puertas A and Garzón E (2023). Efficient design of a quantum absolute-value circuit using Clifford+T gates, The Journal of Supercomputing, 79:11, (12656-12670), Online publication date: 1-Jul-2023. Khanna G, Chaturvedi S and Othman M (2023). On design and performance analysis of improved shuffle exchange gamma interconnection network layouts, The Journal of Supercomputing, 79:11, (11611-11640), Online publication date: 1-Jul-2023.Li X, Parazeres M, Oberman A, Ghaffari A, Asgharian M and Nia V (2023). EuclidNets: An Alternative Operation for Efficient Inference of Deep Learning Models, SN Computer Science, 4:5, Online publication date: 30-Jun-

A Multi-tenant Key-value SSD with Secondary Index for Search Query Processing and Analysis, ACM Transactions on Embedded Computing Systems, 22:4, (1-27), Online publication date: 31-Jul-2023. Sahabandu D, Mertoguno J and Poovendran R (2023). A Natural Language Processing Approach for Instruction Set Architecture Identification, IEEE Transactions on Information Forensics and Security, 18, (4086-4099), Online publication date: 1-Jan-2023. Kong X, Zheng 2023. Villon L, Susskind Z, Bacellar A, Miranda I, de Araújo L, Lima P, Breternitz M, John L, França F and Dutra D (2023). A conditional branch predictor based on weightless neural networks, Neurocomputing, 555:C, Online publication date: 28-Oct-2023. Gade S and Deb S (2021). A Novel Hybrid Cache Coherence with Global Snooping for Many-core Architectures, ACM Transactions on Design Automation of Electronic Systems, 27:1, (1-31), Online publication date: 31-Jan-2022. Wang M, Wen C and Chao H (2021). Roadrunner+: An Autonomous Intersection Management Cooperating with Connected Autonomous Vehicles and Pedestrians with Spillback Considered, ACM Transactions on Cyber-Physical Systems, 6:1, (1-29), Online publication date: 31-Jan-2022. Mbongue J, Kwadjo D, Shuping A and Bobda C (2022). Deploying Multi-tenant FPGAs within Linuxbased Cloud Infrastructure, ACM Transactions on Reconfigurable Technology and Systems, 15:2, (1-31), Online publication date: 30-Jun-2022. Priya Dharishini P and Ramana Murthy P Static Analyzer for Computing WCET of Multithreaded Programs using Hoare's CSP 15th Innovations in Software Engineering Conference, (1-12) Arras P, Andronidis A, Pina L, Mituzas K, Shu Q, Grumberg D and Cadar C (2022). SaBRe: load-time selective binary rewriting, International Journal on Software Tools for Technology Transfer (STTT), 24:2, (205-223), Online publication date: 1-Apr-2022. Jiang Z, Dong P, Wei R, Zhao Q, Wang Y, Zhu D, Zhuang Y and Audsley N (2022). PSpSys, Journal of Systems Architecture: the EUROMICRO Journal, 123:C, Online publication date: 1-Feb-2022.Xiong W and Szefer J (2021). Survey of Transient Execution Attacks and Their Mitigations, ACM Computing Surveys, 54:3, (1-36), Online publication date: 30-Apr-2022.Berg B, Whitehouse J, Moseley B, Wang W and Harchol-Balter M (2021). The case for phase-aware scheduling of parallelizable jobs, Performance Evaluation, 153:C, Online publication date: 1-Feb-2022.Guerrero-Balaguera J, Condia J and Reorda M A compaction method for STLs for GPU in-field test Proceedings of the 2022 Conference & Exhibition on Design, Automation & Test in Europe, (454-459)Gudaparthi S and Shrestha R (2022).

Selective register-file cache: an energy saving technique for embedded processor architecture, Design Automation for Embedded Systems, 26:2, (105-124), Online publication date: 1-Jun-2022.Orts F, Ortega G, Filatovas E and M. Garzón E (2022). Implementation of three efficient 4-digit fault-tolerant quantum carry lookahead adders, The Journal of

Supercomputing, 78:11, (13323-13341), Online publication date: 1-Jul-2022. Mahafzah B, Al-Adwan A and Zaghloul R (2022). Topological properties assessment of optoelectronic architectures, Telecommunications Systems, 80:4, (599-627), Online publication date: 1-Aug-2022. Zhang J, Cheng Y, Deng X, Wang B, Xie J, Yang Y and Zhang M (2022). A Reputation-Based Mechanism for Transaction Processing in Blockchain Systems, IEEE Transactions on Computers, 71:10, (2423-2434), Online publication date: 1-Oct-2022. Gebregiorgis A, Du Nguyen H, Yu J, Bishnoi R, Taouil M, Catthoor F and Hamdioui S (2022). A Survey on Memory-centric Computer Architectures, ACM Journal on Emerging Technologies in Computing Systems, 18:4, (1-50), Online publication date: 31-Oct-2022. Bang T, May N, Petrov I and Binnig C (2022). The full story of 1000 cores, The VLDB Journal — The International Journal on Very Large Data Bases, 31:6, (1185-1213), Online publication date: 1-Nov-2022. Resch S, Khatamifard S, Chowdhury Z, Zabihi M, Zhao Z, Cilasun H, Wang J, Sapatnekar S and Karpuzcu U (2022). Energy-efficient and Reliable Inference in Nonvolatile Memory under Extreme Operating Conditions, ACM Transactions on Embedded Computing Systems, 21:5, (1-36), Online publication date: 30-Sep-2022. Neto A, Neto J and Moreno E (2022). The development of a low-cost big data cluster using Apache Hadoop and Raspberry Pi. A complete guide, Computers and Electrical Engineering, 104:PA, Online publication date: 1-Dec-2022.Rosenbloom P Thoughts on Architecture Artificial General Intelligence, (364-373)Kopper P, Copplestone S, Pfeiffer M, Koch C, Fasoulas S and Beck A (2022) Hybrid parallelization of Euler-Lagrange simulations based on MPI-3 shared memory, Advances in Engineering Software, 174:C, Online publication date: 1-Nov-2022. Jiang Z, Yang K, Fisher N, Audsley N and Dong Z (2022). Towards an energy-efficient quarter-clairvoyant mixed-criticality system, Journal of Systems Architecture: the EUROMICRO Journal, 130:C, Online publication date: 1-Sep-2022. Baldassin A, Barreto J, Castro D and Romano P (2021). Persistent Memory, ACM Computing Surveys, 54:7, (1-37), Online publication date: 30-Sep-2022. Resch S and Karpuzcu U (2021). Benchmarking Quantum Computers and the Impact of Quantum Noise, ACM Computing Surveys, 54:7, (1-35), Online publication date: 30-Sep-2022. Shukla S, Bandishte S, Gaur J and Subramoney S Register file prefetching Proceedings of the 49th Annual International Symposium on Computer Architecture, (410-423) Paul A, Choi J, Karimi A and Wang F Machine Learning Assisted HPC Workload Trace Generation for Leadership Scale Storage Systems Proceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing, (199-212)Wu N, Yang H, Xie Y, Li P and Hao C High-level synthesis performance prediction using GNNs Proceedings of the 59th ACM/IEEE Design Automation Conference, (49-54)Beckmann N, Gibbons P and McGuffey C Brief Announcement: Spatial Locality and Granularity Change in Caching Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures, (173-175)Bura A, Rengarajan D, Kalathil D, Shakkottai S and Chamberland J (2021). Learning to Cache and Caching to Learn: Regret Analysis of Caching Algorithms, IEEE/ACM Transactions on Networking, 30:1, (18-31), Online publication date: 1-Feb-2022.Li Y, Yu X, Yang Y, Zhou Y, Yang T, Ma Z and Chen S (2021). Pyramid Family: Generic Frameworks for Accurate and Fast Flow Size Measurement, IEEE/ACM Transactions on Networking, 30:1, (18-31), Online publication date: 1-Feb-2022.Li Y, Yu X, Yang Y, Zhou Y, Yang T, Ma Z and Chen S (2021). Networking, 30:2, (586-600), Online publication date: 1-Apr-2022.Kim H, Amarnath A, Bagherzadeh J, Talati N and Dreslinski R (2021). A Survey Describing Beyond Si Transistors and Exploring Their Implications for Future Processors, ACM Journal on Emerging Technologies in Computing Systems, 17:3, (1-44), Online publication date: 31-Jul-2021.Bazzaz M, Hoseinghorban A and Ejlali A (2021). Fast and Predictable Non-Volatile Data Memory for Real-Time Embedded Systems, IEEE Transactions on Computers, 70:3, (359-371), Online publication date: 1-Mar-2021.Schuiki F, Zaruba F, Hoefler T and Benini L (2021). Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores, IEEE Transactions on Computers, 70:2, (212-227), Online publication date: 1-Feb-2021. Deep Reinforcement Learning for Delay-Oriented IoT Task Scheduling in SAGIN, IEEE Transactions on Wireless Communications, 20:2,

Branchboozle Proceedings of the 36th Annual ACM Symposium on Applied Computing, (1617-1625)Chen J, Lu C, Ni J, Guo X, Girard P and Cheng Y (2021). DOVA PRO: A Dynamic Overwriting Voltage Adjustment Technique for STT-MRAM L1 Cache Considering Dielectric Breakdown Effect, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 29:7, (1325-1334), Online publication date: 1-Jul-2021.Das A, Jose J and Mishra P (2021). Data Criticality in Multithreaded Applications: An Insight for Many-Core Systems, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 29:9, (1675-1679), Online publication Using Graph Neural Networks for Circuit Reverse Engineering 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD), (1-9)Wu Y, Li J, Dai H, Yi X, Wang Y and Yang X micROS.BT: An Event-Driven Behavior Tree Framework for Swarm Robots 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), (9146-9153)Mustard C, Goswami S, Gharavi N, Nider J, Beschastnikh I and Fedorova A Jumpgate Proceedings of the 14th ACM International Conference on Systems and Storage, (1-12)Min D and Kim Y Isolating namespace and performance in key-value SSDs for multi-tenant environments Proceedings of the 2021 IEEE/ACM 25th H, Liu F, Zheng Y, Zheng Y and Zhang S Exploiting Different Levels of Parallelism in the Quantum Control Microarchitecture for Superconducting Qubits MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, (898-911)LeMay M, Rakshit J, Deutsch S, Durham D, Ghosh S, Nori A, Gaur J, Weiler A, Sultana S, Grewal K and Subramoney S Cryptographic Capability Computing MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, (253-267)Skarlatos D, Zhao Z, Paccagnella R, Fletcher C and Torrellas J Jamais vu: thwarting microarchitecture, (253-267)Skarlatos D, Zhao Z, Paccagnella R, Fletcher C and Torrellas J Jamais vu: thwarting microarchitecture, (253-267)Skarlatos D, Zhao Z, Paccagnella R, Fletcher C and Torrellas J Jamais vu: thwarting microarchitecture, (253-267)Skarlatos D, Zhao Z, Paccagnella R, Fletcher C and Torrellas J Jamais vu: thwarting microarchitecture, (253-267)Skarlatos D, Zhao Z, Paccagnella R, Fletcher C and Torrellas J Jamais vu: thwarting microarchitecture, (253-267)Skarlatos D, Zhao Z, Paccagnella R, Fletcher C and Torrellas J Jamais vu: thwarting microarchitecture, (253-267)Skarlatos D, Zhao Z, Paccagnella R, Fletcher C and Torrellas J Jamais vu: thwarting microarchitecture, (253-267)Skarlatos D, Zhao Z, Paccagnella R, Fletcher C and Torrellas J Jamais vu: thwarting microarchitecture, (253-267)Skarlatos D, Zhao Z, Paccagnella R, Fletcher C and Torrellas J Jamais vu: thwarting microarchitecture, (253-267)Skarlatos D, Zhao Z, Paccagnella R, Fletcher C and Torrellas J Jamais vu: thwarting microarchitecture, (253-267)Skarlatos D, Zhao Z, Paccagnella R, Fletcher C and Torrellas J Jamais vu: thwarting microarchitecture, (253-267)Skarlatos D, Zhao Z, Paccagnella R, Fletcher C and Torrellas J Jamais vu: thwarting microarchitecture, (253-267)Skarlatos D, Zhao Z, Paccagnella R, Fletcher C and Torrellas J Jamais vu: thwarting microarchitecture, (253-267)Skarlatos D, Zhao Z, Paccagnella R, Fletcher C and Torrellas J Jamais vu: thwarting microarchitecture, (253-267)Skarlatos D, Zhao Z, Paccagnella R, Fletcher C and Torrellas J Jamais vu: thwarting microarchitecture, (253-267)Skarlatos D, Zhao Z, Paccagnella R, Fletcher C and Torrellas J Jamais vu: thwarting microarchitecture, (253-267)Skarlatos D, Zhao Z, Paccagnella R, Paccagnella R, Paccagnella R, Paccagnella R, Paccagnella R, Paccagnella R, Paccagnel Programming Languages and Operating Systems, (1061-1076)Zeitak A and Morrison A Cuckoo Trie Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles, (147-162)Parra P, Guzmán D, Polo Ó, da Silva A, Martínez A, Sánchez S and Prieto M (2021). Improving performance and determinism of multitasking systems on the LEON architecture, Microprocessors & Microsystems, 80:C, Online publication date: 1-Feb-2021.Lozano R and Schulte C (2019). Survey on Combinatorial Register Allocation and Instruction Scheduling, ACM Computing Surveys, 52:3, (1-50), Online publication date: 31-May-2020.Jeon Y, Park B, Kwon S, Kim B, Yun J and Lee D BiQGEMM Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, (1-16)Zhang R, Biswas S, Balaji V, Bond M and Lucia B Peacenik Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, (317-333)Nguyen H, Yu J, Lebdeh M, Taouil M, Hamdioui S and Catthoor F (2020). A Classification of Memory-Centric Computing, ACM Journal of Parallel and Distributed Computing, 139:C, (135-147), Online publication date: 1-May-2020.Liu B, Cheshmi K, Soori S, Strout M and Dehnavi M MatRox Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, (389-402)Lipp M, Schwarz M, Gruss D, Prescher T, Haas W, Horn J, Mangard S, Kocher P, Genkin D, Yarom Y, Hamburg M and Strackx R (2020). Meltdown, Communications of the ACM, 63:6, (46-56), Online publication date: 21-May-2020. Coffin E, Young S, Kaur H, Brown J, Pirvu M and Kent K MicroJIT Proceedings of the 30th Annual International Conference on Computer Science and Software Engineering, (179-188) Ritter F and Hack S PMEvo: portable inference of port mappings for out-of-

(911-925), Online publication date: 1-Feb-2021. Zhang J, Zhou X, Ge T, Wang X and Hwang T (2021). Joint Task Scheduling and Containerizing for Efficient Edge Computing, IEEE Transactions on Parallel and Distributed Systems, 32:8, (2086-2100), Online publication date: 1-Feb-2021. Moreira A, Ottoni G and Quintão Pereira F (2021). VESPA: static profiling for binary optimization, Proceedings of the ACM on Programming Languages, 5:OOPSLA, (1-28), Online publication date: 20-Oct-2021.Carvalho D and Seznec A (2021). Understanding Cache Compression, ACM Transactions on Architecture and Code Optimization, 18:3, (1-27), Online publication date: 30-Sep-2021.C. A, Lee W and Lin W

order processors by evolutionary optimization Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation, (608-622)Park H, Ahn H and Jung S (2020). A Novel Matchline Scheduling Method for Low-Power and Reliable Search Operation in Cross-Point-Array Nonvolatile Ternary CAM, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 28:12, (2650-2657), Online publication date: 1-Apr-2020. Jošilo S and Dán G (2020). Computation Offloading Scheduling for Periodic Tasks in Mobile Edge Computing, IEEE/ACM Transactions on Networking, 28:2, (667-680), Online publication date: 1-Apr-2020. Zhang Z, Henderson T, Karaman S and FSMI, International Journal of Robotics Research, 39:9, (1155-1177), Online publication date: 1-Aug-2020. Soft-HaT, ACM Transactions on Design Automation of Electronic Systems, 25:4, (1-22), Online publication date: 2-Sep-2020. Berg B, Berger D, McAllister S, Grosof I, Gunasekar S, Lu J, Uhlar M, Carrig J, Beckmann N, Harchol-Balter M and Ganger G The CacheLib caching engine Proceedings of the 14th USENIX Conference on Operating Systems Design and Implementation, (769-786)Manjith B.C. and Ramasubramanian N. (2020). Securing AES Accelerator from Key-Leaking Trojans on FPGA, International Journal of Embedded

Energy-efficient Real-time Scheduling on Multicores, ACM Transactions on Embedded Computing Systems, 19:4, (1-25), Online publication date: 1-Apr-2020.Orts F Ortega G, Puertas A, García I and Garzón E (2020). On solving the unrelated parallel machine scheduling problem: active microrheology as a case study, The Journal of Supercomputing, 76:11, (8494-8509), Online publication date: 1-Nov-2020. Mozafari S and Meyer B (2020). and Szymczyk P (2020). Automatic processing of Z-transform artificial neural networks using parallel programming, Neurocomputing, 379:C, (74-88), Online publication date: 28-Feb-2020. Damaj I, Elshafei M, El-Abd M and Aydin M (2022). An analytical framework for high-speed hardware particle swarm optimization, Microprocessors &

Microsystems, 72:C, Online publication date: 1-Feb-2020.Salazar C and Bobby Birrer M Instrumentation and Extension of reduced, simulated Single Cycle MIPS architecture to improve Student Comprehension 2020 IEEE Frontiers in Education Conference (FIE), (1-5)Wang M, Wang J, Wen C and Chao H Roadrunner: Autonomous Intersection Management with Dynamic Lane Assignment 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), (1-7)Atre N, Sherry J, Wang W and Berger D Caching with Delayed Hits Proceedings of the Annual conference on Intelligent Transportation Systems (ITSC), (1-7)Atre N, Sherry J, Wang W and Berger D Caching with Delayed Hits Proceedings of the Annual conference on Intelligent Transportation Systems (ITSC), (1-8)Atre N, Sherry J, Wang W and Berger D Caching with Delayed Hits Proceedings of the Annual conference on Intelligent Transportation Systems (ITSC), (1-8)Atre N, Sherry J, Wang W and Berger D Caching with Delayed Hits Proceedings of the Annual conference on Intelligent Transportation Systems (ITSC), (1-8)Atre N, Sherry J, Wang W and Berger D Caching with Delayed Hits Proceedings of the Annual conference on Intelligent Transportation Systems (ITSC), (1-8)Atre N, Sherry J, Wang W and Berger D Caching with Delayed Hits Proceedings of the Annual conference on Intelligent Transportation Systems (ITSC), (1-8)Atre N, Sherry J, Wang W and Berger D Caching W and Berger D architectures, and protocols for computer communication, (495-513)Salehnamadi N, Alshayban A, Ahmed I and Malek S ER catcher Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, (324-335)Lanuza J, Trabes G and Wainer G Parallel execution of DEVS in shared-memory multicore architectures Proceedings of the 2020 Spring Simulation Conference, (1-11)Vineyard C, Plagge M and Green S Comparing Neural Accelerators & Neuromorphic Architectures The False Idol of Operations Proceedings of the 2020 Annual Neuro-Inspired Computational Elements Workshop, (1-6)Chen Y (2019). Reshaping Future Computing Systems With Emerging Nonvolatile Memory Technologies, IEEE Micro, 39:1, (54-57), Online publication date: 1-Jan-2019. Guo X, Wang H, Zhang C, Tang H and Yuan Y Leakage-aware thermal management for multi-core systems using piecewise linear model based predictive control Proceedings of the 24th Asia and South Pacific Design Automation Conference, (64-69)Shelor C and Kavi K Reconfigurable dataflow graphs for processing-in-memory Proceedings of the 20th International Conference on Distributed Computing and Networking (110-119)Rhisheekesan A, Jeyapaul R and Shrivastava A (2019). Control Flow Checking or Not? (for Soft Errors), ACM Transactions on Embedded Computing Systems, 18:1, (1-25), Online publication in network-on-chips Proceedings of the ACM International Conference on Supercomputing, (217-226)Coffin E, Young S, Kent K and Pirvu M A roadmap for extending MicroJIT Proceedings of the 23rd IEEE/ACM International Symposium on Distributed Simulation and Real Time Applications, (146-153)Al-Adwan A, Sharieh A and Mahafzah B (2019). Parallel heuristic local search algorithm on OTIS hyper hexa-cell and OTIS mesh of trees optoelectronic architectures, Applied Intelligence, 49:2, (661-688), Online publication date: 1-Feb-2019.Li F, Xu L, Duan S, Wu W, Zhao H and Ling Q (2019). Improving hierarchical mobile video caching through distributed cross-layer coordination, Multimedia Tools and Applications, 78:5, (6049-6071), Online publication date: 1-Mar-2019.Reichenbach M, Holzinger P, Häublein K, Lieske T, Blinzer P and Fey D (2019). Heterogeneous Computing Utilizing FPGAs Journal of Signal Processing Systems, 91:7, (745-757), Online publication date: 1-Jul-2019. Hou Y, He H, Shamsi K, Jin Y, Wu D and Wu H (2019). On-Chip Analog Trojan Detection Framework for Microprocessor Trustworthiness, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 38:10, (1820-1830), Online publication date: 1-Oct-2019.Zaruba F and Benini L (2019). The Cost of Application-Class Processing: Energy and Performance Analysis of a Linux-Ready 1.7-GHz 64-Bit RISC-V Core in 22-nm FDSOI Technology, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 27:11, (2629-2640), Online publication date: 1-Nov-2019.Pontarelli S, Bonola M and Bianchi G (2018). Smashing OpenFlow's "atomic" actions, International Journal of Network Management, 29:1, Online publication and Instruction Scheduling, ACM Transactions on Programming Languages and Systems, 41:3, (1-53), Online publication date: 30-Sep-2019.Geng T, Wang T, Wu C, Yang C, Wu W, Li A and Herbordt M O3BNN Proceedings of the ACM International Conference on Supercomputing, (461-472)Ponugoti M and Milenkovic A (2019). Enabling On-the-Fly Hardware Tracing of Data Reads in Multicores, ACM Transactions on Embedded Computing Systems, 18:4, (1 27), Online publication date: 31-Jul-2019. Moreira F, Oliveira D and Navaux P SPADA Proceedings of the 16th ACM International Conference on Computing Frontiers, (50-58) Nongpoh B, Ray R, Das M and Banerjee A (2019). Enhancing Speculative Execution With Selective Approximate Computing, ACM Transactions on Design Automation of Electronic Systems, 24:2, (1-29), Online publication date: 21-Mar-2019. Gurung A and Ray R Simultaneous Solving of Batched Linear Programs on a GPU Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering, (59-66) Jordan H, Subotić P, Zhao D and Scholz B A specialized B-tree for concurrent datalog evaluation Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming, (327-339) Pittino F, Benini L and Cavazzoni C Prediction of Time-to-Solution in Material Science Simulations Using Deep Learning Proceedings of the Platform for Advanced Scientific Computing Conference, (1-9) Ying B, Yuan K and Cavazzoni C Prediction of Time-to-Solution in Material Science Simulations Using Deep Learning Proceedings of the Platform for Advanced Scientific Computing Conference, (1-9) Ying B, Yuan K and Cavazzoni C Prediction of Time-to-Solution in Material Science Simulations Using Deep Learning Proceedings of the Platform for Advanced Scientific Computing Conference, (1-9) Ying B, Yuan K and Cavazzoni C Prediction of Time-to-Solution in Material Science Simulations Using Deep Learning Proceedings of the Platform for Advanced Scientific Computing Conference, (1-9) Ying B, Yuan K and Cavazzoni C Prediction of Time-to-Solution in Material Science Simulations Using Deep Learning Proceedings of the Platform for Advanced Scientific Computing Conference, (1-9) Ying B, Yuan K and Cavazzoni C Prediction of Time-to-Solution in Material Science Simulations Using Deep Learning Proceedings of the Platform for Advanced Science Simulation (1-9) Ying B, Yuan K and Cavazzoni C Prediction (1-9) Ying B, Yuan K and Cavazzoni C Prediction (1-9) Ying B, Yuan K and Cavazzoni C Prediction (1-9) Ying B, Yuan K and Cavazzoni C Prediction (1-9) Ying B, Yuan K and Cavazzoni C Prediction (1-9) Ying B, Yuan K and Cavazzoni C Prediction (1-9) Ying B, Yuan K and Cavazzoni C Prediction (1-9) Ying B, Yuan K and Cavazzoni C Prediction (1-9) Ying B, Yuan K and Cavazzoni C Prediction (1-9) Ying B, Yuan K and Cavazzoni C Prediction (1-9) Ying B, Yuan K and Cavazzoni C Prediction (1-9) Ying B, Yuan K and Cavazzoni C Prediction (1-9) Ying B, Yuan K and Cavazzoni C Prediction (1-9) Ying B, Yuan K and Cavazzoni C Prediction (1-9) Ying B, Yuan K and Cavazzoni C Prediction (1-9) Ying B, Yuan K and Cavazzoni C Prediction (1-9) Ying B, Yuan K and Cavazzon Sayed A (2019). Supervised Learning Under Distributed Features, IEEE Transactions on Signal Processing, 67:4, (977-992), Online publication date: 1-Feb-2019. Calciu I, Puddu I, Kolli A, Nowatzyk A, Gandhi J, Mutlu O and Subrahmanyam P Project PBerry Proceedings of the Workshop on Hot Topics in Operating Systems, (127-135) Wang L, Gao W, Yang K and Jiang Z BOPS, A New Computation-Centric Metric for Datacenter Computing Benchmarking, Measuring, and Optimizing, (262-277) Edelkamp S and Weiß A (2019). BlockQuicksort, ACM Journal of Experimental Algorithmics, 24, (1-22), Online publication date: 17-Dec-2019.Liu Z, Nath A, Ding X, Fu H, Muhib Khan M and Yu W (2022). Multivariate modeling and two-level scheduling of analytic queries, Parallel Computing, 85:C, (66-78), Online publication and scheduling of System Architecture: the EUROMICRO Journal, 98:C, (63-78), Online publication date: 1-Sep-2019. Nadeau D, Ezzati-Jivan N and Dagenais M (2019). Efficient large-scale heterogeneous debugging using dynamic tracing, Journal of Systems Architecture: the EUROMICRO Journal, 98:C, (346-360), Online publication date: 1-Sep-2019. García-Martín E, Rodrigues C, Riley G and Company and Co Grahn H (2019). Estimation of energy consumption in machine learning, Journal of Parallel and Distributed Computing, 134:C, (75-88), Online publication date: 1-Dec-2019.Li G, Yang Y, Le F, Lim Y and Wang J Update Algebra: Toward Continuous, Non-Blocking Composition of Network Updates in SDN IEEE INFOCOM 2019 - IEEE Conference on Computer Communications, (1081-1089)Sperl P and Böttinger K Side-Channel Aware Fuzzing Computer Security - ESORICS 2019, (259-278)Van Sandt P, Chronis Y and Patel J Efficiently Searching In-Memory Sorted Arrays Proceedings of the 2019 International Conference on Management of Data, (36-53)Jošilo S and Dán G (2018). Selfish Decentralized Computation Offloading for Mobile Cloud Computing in Dense Wireless Networks, IEEE Transactions on Mobile Computing Surveys of End-System Optimizations for High-Speed Networks, ACM Computing Surveys 51:3, (1-36), Online publication date: 31-May-2019. Park S, Wu Y, Lee J, Aupov A and Mahlke S (2019). Multi-objective Exploration for Practical Optimization date: 31-Oct-2019. Castro-Godínez J, Shafique M and Henkel J (2019). ECAx, ACM Transactions on Embedded Computing Systems, 18:5s, (1-20), Online publication date: 31-Oct-2019.Real P, Molina-Abril H, Díaz-del-Río F, Blanco-Trejo S and Onchis D Enhanced Parallel Generation of Tree Structures for the Recognition of 3D Images Pattern Recognition, (292-301)Ayers G, Nagendra N, August D, Cho H, Kanev S, Kozyrakis C, Krishnamurthy T, Litz H, Moseley T and Ranganathan P AsmDB Proceedings of the 46th International Symposium on Computer Architecture, (462-473)Yan M, Choi J, Skarlatos D, Morrison A, Fletcher C and Torrellas J InvisiSpec Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, (428-441)Dey M Nazari A, Zajic A and Prvulovic M TEMProf Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, (881-893)Prakash A, Clarke C, Lam S and Srikanthan T (2018). Rapid Memory-Aware Selection of Hardware Accelerators in Programmable SoC Design, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 26:3, (445-456), Online publication date: 1-Mar-2018. Malas T, Hager G, Ltaief H and Keyes D (2017). Multidimensional Intratile Parallelization for Memory-Starved Stencil Computations, ACM Transactions on Parallel Computing, 4:3, (1-32), Online publication date: 27-Apr-2018. Morse J, Kerrison S and Eder K (2018). On the Limitations of Analyzing Worst-Case Dynamic Energy of Processing, ACM Transactions on Embedded Computing Systems, 17:3, (1-22), Online publication date: 31-May-2018.Lee J, Kim C, Lin K, Cheng L, Govindaraju R and Kim J WSMeter Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, (549-563) Josipović L, Ghosal R and Ienne P Dynamically Scheduled High-level Synthesis Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, (127-136) Chen K and Chen C (2018). Enabling SIMT Execution Model on Homogeneous Multi-Core System, ACM Transactions on Architecture and Code Optimization, 15:1, (1-26), Online publication date: 31-Mar-2018. Baba T, Watanabe S, Jackin B, Ohkawa T, Ootsu K, Yokota T, Hayasaki Y and Yatagai T Overcoming the difficulty of large-scale CGH generation on multi-GPU cluster Proceedings of the 11th Workshop on General Purpose GPUs, (13-21)Kwon K, Amid A, Gholami A Wu B, Asanovic K and Keutzer K Co-design of deep neural nets and neural net accelerators for embedded vision applications Proceedings of the 55th Annual Design Automation Conference, (1-6)Crawford P, Barnes Jr. P, Eidenbenz S and Wilsey P Sampling Simulation Model Profile Data for Analysis Proceedings of the 2018 ACM SIGSIM Conference, on Principles of Advanced Discrete Simulation, (17-28)Kelefouras V and Djemame K A methodology for efficient code optimizations and memory management Proceedings of the 2018 International Conference on Computing Frontiers, (105-112)Kougkas A, Devarajan H and Sun X IRIS Proceedings of the 2018 International Conference on Computing Frontiers, (105-112)Kougkas A, Devarajan H and Sun X IRIS Proceedings of the 2018 International Conference on Computing Frontiers, (105-112)Kougkas A, Devarajan H and Sun X IRIS Proceedings of the 2018 International Conference on Computing Frontiers, (105-112)Kougkas A, Devarajan H and Sun X IRIS Proceedings of the 2018 International Conference on Computing Frontiers, (105-112)Kougkas A, Devarajan H and Sun X IRIS Proceedings of the 2018 International Conference on Computing Frontiers, (105-112)Kougkas A, Devarajan H and Sun X IRIS Proceedings of the 2018 International Conference on Computing Frontiers, (105-112)Kougkas A, Devarajan H and Sun X IRIS Proceedings of the 2018 International Conference on Computing Frontiers, (105-112)Kougkas A, Devarajan H and Sun X IRIS Proceedings of the 2018 International Conference on Computing Frontiers, (105-112)Kougkas A, Devarajan H and Sun X IRIS Proceedings of the 2018 International Conference on Computing Frontiers, (105-112)Kougkas A, Devarajan H and Sun X IRIS Proceedings of the 2018 International Conference on Computing Frontiers, (105-112)Kougkas A, Devarajan H and Sun X IRIS Proceedings of the 2018 International Conference on Computing Frontiers, (105-112)Kougkas A, Devarajan H and Sun X IRIS Proceedings of the 2018 International Conference on Computing Frontiers, (105-112)Kougkas A, Devarajan H and Sun X IRIS Proceedings of the 2018 International Conference on Computing Frontiers, (105-112)Kougkas A, Devarajan H and Sun X IRIS Proceedings of the 2018 International Conference on Computing Frontiers (105-112)Kougkas A, Devarajan H and Sun X IRIS Proceedings of the 2018 International Conference On Computing Frontiers (105-112)Kougkas A, Devar Supercomputing, (33-42)Zhang J and Gruenwald L Regularizing irregularity Proceedings of the 1st ACM SIGMOD Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA), (1-8)Zoni D, Barenghi A, Pelosi G and Fornaciari W (2018). A Comprehensive Side-Channel Information Leakage Analysis of an In-Order RISC CPU Microarchitecture, ACM Transactions on Design Automation of Electronic Systems, 23:5, (1-30), Online publication Framework for Embedded Out-of-Order Processors Using Software Characteristics, ACM Transactions on Embedded Computing Systems, 17:4, (1-25), Online publication date: 29-Aug-2018. Ognawala S, Amato R, Pretschner A and Kulkarni P Automatically assessing vulnerabilities discovered by compositional analysis Proceedings of the 1st International Workshop on Machine Learning and Software Engineering in Symbiosis, (16-25)Khattab O, Hammoud M and Shekfeh O PolyHJ Proceedings of the 27th ACM International Conference on Information and Knowledge Management, (1323-1332)Rashid S, Nelissen G and Tovar E Trading Between Intra- and Inter-Task Cache Interference to Improve Schedulability Proceedings of the 26th International Conference on Real-Time Networks and Systems, (125-136) Einziger G, Eytan O, Friedman R and Manes B Adaptive Software Cache Management Proceedings of the 19th International Middleware Conference, (94-106) Jimenez L and Agyeman M A Study of Techniques to Increase Instruction Level Parallelisms Proceedings of the 2nd International Symposium on Computer Science and Intelligent Control, (1-5)Lee J, Kim C, Lin K, Cheng L, Govindaraju R and Kim J (2018). WSMeter, ACM SIGPLAN Notices, 53:2, (549-563), Online publication date: 30-Nov-2018. Zhang J, Wu C, Yang D, Chen Y, Meng X, Xu L and Guo M (2018). HSCS, Frontiers of Computer Science: Selected Publications from Chinese Universities, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12:6, 12: (1090-1104), Online publication date: 1-Dec-2018.Liao C, Lee S, Chiou Y, Lee C and Lee C (2018). Power consumption minimization by distributive particle swarm optimization for luminance control and its parallel implementations, Expert Systems with Applications: An International Journal, 96:C, (479-491), Online publication date: 15-Apr-2018.Breβ S, Köcher B, Funke H, Zeuch S, Rabl T and Markl V (2018). Generating custom code for efficient query execution on heterogeneous processors, The VLDB Journal — The International Journal on Very Large Data Bases, 27:6, (797-822), Online publication date: 1-Dec-2018.Al-Adwan A, Mahafzah B and Sharieh A (2018). Solving traveling salesman problem using parallel repetitive nearest neighbor algorithm on OTIS-Hypercube and OTIS-Mesh optoelectronic architectures, The Journal of Supercomputing, 74:1, (1-36), Online publication date: 1-Jan-2018. Siddique N, Grubel P, Badawy A and Cook J (2018). A performance study of the time-varying cache behavior, The Journal of Supercomputing 74:2, (665-695), Online publication date: 1-Feb-2018. Jakovljević R, Berić A, Van Dalen E and Milićev D (2018). New access modes of parallel memory subsystem for sub-pixel motion date: 1-Feb-2018. Jakovljević R, Berić A, Van Dalen E and Milićev D (2018). Slicing from formal semantics, International Journal on Software Tools for Technology Transfer (STTT), 20:6, (739-769), Online publication date: 1-Nov-2018. Schulz L, Broneske D and Saake G (2018). An eight-dimensional systematic evaluation of optimized search algorithms on modern processors, Proceedings of the VLDB Endowment, 11:11, (1550-1562), Online publication date: 1-Jul-2018. Jiang Z, Gao W, Wang L, Xiong X, Zhang Y, Wen X, Luo C, Ye H, Lu X, Zhang Y, Feng S, Li K, Xu W and Zhan J HPC AI500: A Benchmark Suite for HPC AI Systems Benchmarking, Measuring, and Optimizing, (10-22) Dolbeau R (2018). Theoretical peak FLOPS per instruction set: a tutorial, The Journal of Supercomputing, 74:3, (1341-1377), Online publication date: 1-Mar-2018.Catalán S, Herrero J, Quintana-Ortí E and Rodríguez-Sánchez R (2018). Stress-Aware Loops Mapping on CGRAs with Dynamic Multi-Map Reconfiguration, IEEE Transactions on Parallel and Distributed Systems, 29:9, (2105-2120), Online publication date: 1-Sep-2018.Kwon K, Amid A, Gholami A, Wu B, Asanovic K and Keutzer K Invited: Co-Design of Deep Neural Nets and Neural Nets and Neural Net Accelerators for Embedded Vision Applications 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC), (1-6)Psychou G, Rodopoulos D, Sabry M, Gemmeke T, Atienza D, Noll T and Catthoor F (2017). Classification of Resilience Techniques Against Functional Errors at Higher Abstraction Layers of Digital Systems, ACM Computing Surveys, 50:4, (1-38), Online publication date: 31-Jul-2018. Tan W, Chang S, Fong L, Li C, Wang Z and Cao L Matrix Factorization on GPUs with Memory Optimization and Approximate Computing Proceedings of the 47th International Conference on Parallel Processing, (1-10)Bae D, Jo I, Choi Y, Hwang J, Cho S, Lee D and Jeong J 2B-SSD Proceedings of the 45th Annual International Symposium on Computer Architecture, (425-438)Parasar M, Bhattacharjee A and Krishna T SEESAW Proceedings of the 45th Annual International Symposium on Computer Architecture, (193-206)Tran K, Jimborean A, Carlson T, Koukos K, Själander M and Kaxiras S (2018). SWOOP: software-hardware co-design for non-speculative, execute-ahead, in-order cores, ACM SIGPLAN Notices, 53:4, (328-343), Online publication date: 2-Dec-2018.Tran K, Jimborean A, Carlson T, Koukos K, Själander M and Kaxiras S SWOOP: software-hardware co-design for non-speculative, execute-ahead, in-order cores Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, (328-343), Online publication date: 2-Dec-2018.Tran K, Jimborean A, Carlson T, Koukos K, Själander M and Kaxiras S SWOOP: software-hardware co-design for non-speculative, execute-ahead, in-order cores Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, (328-343), Online publication date: 2-Dec-2018.Tran K, Jimborean A, Carlson T, Koukos K, Själander M and Kaxiras S SWOOP: software-hardware co-design for non-speculative, execute-ahead, in-order cores Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, (328-343), Online publication date: 2-Dec-2018.Tran K, Jimborean A, Carlson T, Koukos K, Själander M and Kaxiras S SWOOP: software-hardware co-design for non-speculative, execute-ahead, in-order cores Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, (328-343), Online publication date: 2-Dec-2018.Tran K, Jimborean A, Carlson T, 343) Melani A, Bertogna M, Davis R, Bonifaci V, Marchetti-Spaccamela A and Buttazzo G (2017). Exact Response Time Analysis for Fixed Priority Memory-Processor Co-Scheduling, IEEE Transactions on Computers, 66:4, (631-646), Online publication date: 1-Apr-2017. Chow K and Zhu W Software Performance Analytics in the Cloud Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering, (419-421)Liu Y and Sun X (2017). Evaluating the Combined Effect of Memory Capacity and Concurrency for Many-Core Chip Design, ACM Transactions on Modeling and Performance Evaluation of Computing Systems, 2:2, (1-25), Online publication date: 5-May-2017. Zhang Y.

Architecture News, 45:2, (295-306), Online publication date: 14-Sep-2017. Sieber C, Durner R, Ehm M, Kellerer W and Sharma P Towards optimal adaptation of NFV packet processing to modern CPU memory architectures Proceedings of the 2nd Workshop on Cloud-Assisted Networking, (7-12)Tran K, Carlson T, Koukos K, Själander M, Spiliopoulos V, Kaxiras S and Jimborean A Clairvoyance: look-ahead compile-time scheduling Proceedings of the 2017 International Symposium on Code Generation and Optimization, (171-184)Crawford P, Eidenbenz S, Barnes P and Wilsey P Some properties of communication behaviors in discrete-event simulation Conference, (1-12)Mai V and Khalil I (2017). Design and implementation of a secure cloud-based billing model for smart meters as an Internet of things using homomorphic cryptography Future Generation Computer Systems, 72:C, (327-338), Online publication date: 1-Jul-2017. Tang Q, Basten T, Geilen M, Stuijk S and Wei J (2017). Mapping of synchronous dataflow graphs on MPSoCs based on parallelism enhancement, Journal of Parallel and Distributed Computing, 101:C, (79-91), Online publication date: 1-Mar-2017.He H, Cui L, Zhou F and Wang D (2017). Distributed proxy cache technology based on autonomic computing in smart cities, Future Generation Computer Systems, 76:C, (370-383), Online publication date: 1-Nov-2017. Gutierrez-Alcoba A, Ortega G, Hendrix E and Garca I (2017). Accelerating an algorithm for perishable inventory control on heterogeneous platforms, Journal of Parallel and Distributed Computing, 104:C, (12-18), Online publication date: 1-Jun-2017. Brandalero M and Beck A A mechanism for energy-efficient reuse of decoding and scheduling of x86 instruction streams Proceedings of the Conference on Design, Automation & Test in Europe, (1472-1477)Qin H, Liu Z, Liu Y and Zhong H (2017). An object-oriented MATLAB toolbox for automotive body conceptual design using distributed parallel optimization, Advances in Engineering

OpenMP parallelization of a gridded SWAT (SWATG), Computers & Geosciences, 109:C, (228-237), Online publication date: 1-Dec-2017. Chen Q, Wang X, Wan H and Yang R (2017). A Logic Circuit Design for Perfecting Memristor-Based Material Implication, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 36:2, (279 and 1995).

Anwer B, Gopalakrishnan V, Han B, Reich J, Shaikh A and Zhang Z ParaBox Proceedings of the Symposium on SDN Research, (143-149) Palangappa P and Mohanram K (2017). CompEx++, ACM Transactions on Architecture and Code Optimization, 14:1, (1-30), Online publication date: 14-Apr-2017. Gupta S and Wilsey P Quantitative Driven Optimization of a Time Warp Kernel Proceedings of the 2017 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, (27-38)Paredes M, Riley G and Luján M Vectorization of Hybrid Breadth First Search on the Intel Xeon Phi Proceedings of the Computing Frontiers Conference, (127-135)Stanic M, Palomar O, Hayes T, Ratkovic I Cristal A, Unsal O and Valero M (2017). An Integrated Vector-Scalar Design on an In-Order ARM Core, ACM Transactions on Architecture and Code Optimization for SIMT GPUs Proceedings of the 44th Annual International Symposium on Computer Architecture, (295-306)Tsai P, Beckmann N and Sanchez D Jenga Proceedings of the 44th Annual International Symposium on Computer Architecture, (652-665)Wickerson J, Batty M, Sorensen T and Constantinides G (2017). Automatically comparing memory consistency models, ACM SIGPLAN Notices, 52:1, (190-204), Online publication date: 11-

U, Villa O, Bolotin E, Arunkumar A, Ebrahimi E, Jaleel A, Ramirez A and Nellans D Beyond the socket Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, (123-135) Huang Y, Guo N, Seok M, Tsividis Y, Mandli K and Sethumadhavan S Hybrid analog-digital solution of nonlinear partial differential equations Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, (665-678) Kulkarni C, Kesavan A, Zhang T, Ricci R and Stutsman R Rocksteady Proceedings of the 26th Symposium on Operating Systems Principles, (390-405) Wang K and Lin C (2017). Decoupled Affine Computation for SIMT GPUs, ACM SIGARCH Computer

284), Online publication date: 1-Feb-2017. Deng S and Suresh K (2017). Topology optimization under thermo-elastic buckling, Structural and Multidisciplinary Optimization date: 1-Feb-2017. RT-CUDA, International Journal of Parallel Programming, 45:3, (551-594), Online publication date: 1-Jun-2017.Ortega G, Filatovas E, Garzón E and Casado L (2017). Non-dominated sorting procedure for Pareto dominance ranking on multicore CPU and/or GPU, Journal of Global Optimization, 69:3, (607-627), Online publication date: 1-Nov-2017.Ortega G, Puertas A and Garzón E (2017). Accelerating the problem of microrheology in colloidal systems on a GPU, The Journal of Supercomputing, 73:1, (370-383), Online publication and Migration in DRAM-PCM Hybrid Main Memory, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 36:9, (1458-1470), Online publication date: 1-Sep-2017. Alioto M Energy-quality scalable adaptive VLSI circuits and systems beyond approximate computing Proceedings of the Conference on Design, Automation & Test in Europe, (127-132) Blohoubek J, Fier P and Schmidt J (2017). Error masking method based on the shortduration offline test, Microprocessors & Microsystems, 52:C, (236-250), Online publication date: 1-Jul-2017. Wan H, Gao X, Long X and Jiang B Introducing parallel computing concepts in computer system related courses 2017 IEEE Frontiers in Education Conference (FIE), (1-7)Chen X, Wardi Y and Yalamanchili S Power regulation in high performance multicore processors 2017 IEEE 56th Annual Conference on Decision and Control (CDC), (2674-2679) Wickerson J, Batty M, Sorensen T and Constantinides G Automatically comparing memory consistency models Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, (190-204) Kleanthous M Sazeides Y, Ozer E, Nicopoulos C, Nikolaou P and Hadjilambrou Z (2016). Toward Multi-Layer Holistic Evaluation of System Designs, IEEE Computer Architecture Letters, 15:1, (58-61), Online publication date: 1-Jan-2016.Quéva C, Couroussé D and Charles H Self-optimisation using runtime code generation for wireless sensor networks Proceedings of the 17th International Conference on Distributed Computing and Networking, (1-6)Bijo S, Johnsen E, Pun K and Tarifa S An operational semantics of cache coherent multicore architectures Proceedings of the 31st Annual ACM Symposium on Applied Computing, (1219-1224)Madarbux M, Van Laer A, Watts P and Jones T Energy Efficient And Lov Latency Interconnection Network For Multicast Invalidates In Shared Memory Systems Proceedings of the 1st International Workshop on Advanced Interconnect Solutions and Technologies for Emerging Computing Systems, (1-6)Fadolalkarim D, Sallam A and Bertino E PANDDE Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, (267-276)Goossens B, Parello D, Porada K and Rahmoune D Parallel Locality and Parallelization Quality Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores, (59-68)Darav N, Kennings A, Tabrizi A, Westwick D and Behjat L (2016). Eh?Placer, ACM Transactions on Design Automation of Electronic Systems, 21:3, (1-27), Online publication date: 26-Jul-2016. Wilsey P Some Properties of Events Executed in Discrete-Event Simulation, (165-176) Luppold A, Kittsteiner C and Falk H Cache-Aware Instruction SPM Allocation for Hard Real-Time Systems Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems, (77-85)Banerjee K, Banerjee S and Sarkar S Data-race detection: the missing piece for an end-to-end semantic equivalence checker for parallelizing transformations of array-intensive programs Proceedings of the 3rd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming, (1-8)Tran K Student Research Poster Proceedings of the 2016 International Workshop on Libraries, Languages, and Compilers for Array Programming, (1-8)Tran K Student Research Poster Proceedings of the 2016 International Workshop on Libraries, Languages, and Compilers for Array Programming, (1-8)Tran K Student Research Poster Proceedings of the 2016 International Workshop on Libraries, Languages, and Compilers for Array Programming, (1-8)Tran K Student Research Poster Proceedings of the 2016 International Workshop on Libraries, Languages, and Compilers for Array Programming, (1-8)Tran K Student Research Poster Proceedings of the 2016 International Workshop on Libraries, Languages, and Compilers for Array Programming, (1-8)Tran K Student Research Poster Proceedings of the 2016 International Workshop on Libraries, Languages, and Compilers for Array Programming, (1-8)Tran K Student Research Poster Proceedings of the 2016 International Workshop on Libraries (1-8)Tran K Student Research Poster Proceedings of the 2016 International Workshop on Libraries (1-8)Tran K Student Research Poster Proceedings of the 2016 International Workshop on Libraries (1-8)Tran K Student Research Poster Proceedings (1-8)Tran K Coherence Protocols for Multi-core Architectures Proceedings of the International Conference on Advances in Information Communication Technology & Computing, (1-7)Siegl P, Buchty R and Berekovic M Data-Centric Computing, (1-7)Siegl P, Buchty R and Berekovic M Data-Centric Computing, (1-7)Siegl P, Buchty R and Berekovic M Data-Centric Computing, (1-7)Siegl P, Buchty R and Berekovic M Data-Centric Computing, (1-7)Siegl P, Buchty R and Berekovic M Data-Centric Computing, (1-7)Siegl P, Buchty R and Berekovic M Data-Centric Computing, (1-7)Siegl P, Buchty R and Berekovic M Data-Centric Computing, (1-7)Siegl P, Buchty R and Berekovic M Data-Centric Computing, (1-7)Siegl P, Buchty R and Berekovic M Data-Centric Computing, (1-7)Siegl P, Buchty R and Berekovic M Data-Centric Computing, (1-7)Siegl P, Buchty R and Berekovic M Data-Centric Computing, (1-7)Siegl P, Buchty R and Berekovic M Data-Centric Computing, (1-7)Siegl P, Buchty R and Berekovic M Data-Centric Computing, (1-7)Siegl P, Buchty R and Berekovic M Data-Centric Computing, (1-7)Siegl P, Buchty R and Berekovic M Data-Centric Computing, (1-7)Siegl P, Buchty R and Berekovic M Data-Centric Computing, (1-7)Siegl P, Buchty R and Berekovic M Data-Centric Computing, (1-7)Siegl P, Buchty R and Berekovic M Data-Centric Computing, (1-7)Siegl P, Buchty R and Berekovic M Data-Centric Computing, (1-7)Siegl P, Buchty R and Berekovic M Data-Centric Computing, (1-7)Siegl P, Buchty R and Berekovic M Data-Centric Computing, (1-7)Siegl P, Buchty R and Berekovic M Data-Centric Computing, (1-7)Siegl P, Buchty R and Berekovic M Data-Centric Computing, (1-7)Siegl P, Buchty R and Berekovic M Data-Centric Computing, (1-7)Siegl P, Buchty R and Berekovic M Data-Centric Computing, (1-7)Siegl P, Buchty R and Berekovic M Data-Centric Computing, (1-7)Siegl P, Buchty R and Berekovic M Data-Centric Computing, (1-7)Siegl P, Buchty R and Berekovic M Data-Centric Computing, (1-7)Siegl P, Buchty R and Berekovic M Data-Centric Computing, (1-7)Siegl P, Buchty R and Berekovic M Data-Centric Computing Reineke J Enabling Compositionality for Multicore Timing Analysis Proceedings of the 24th International Conference on Real-Time Networks and Systems, (299-308) Fernandes F, Weigel L, Jung C, Navaux P, Carro L and Rech P (2016). Evaluation of Histogram of Oriented Gradients Soft Errors Criticality for Automotive Applications, ACM Transactions on Architecture and Code Optimization, 13:4, (1-25), Online publication date: 28-Dec-2016. Bederián C and Wolovick N A project-based HPC course for single-box computers Proceedings of the Workshop on Education for High Performance Computing, (1-6) Sewall J, Pennycook S, Duran A, Tian X and Narayanaswamy R A modern memory management

system for OpenMP Proceedings of the Third International Workshop on Accelerator Programming Using Directives, (25-35) Johnson P and Ekstedt M (2016). The Tarpit - A general theory of software engineering, Information and Software Technology, 70:C, (181-203), Online publication date: 1-Feb-2016. Savidis I, Ciftcioglu B, Xu J, Hu J, Jain M, Berman R, Xue J, Liu P, Moore D, Wicks G, Huang M, Wu H and Friedman E (2016). Heterogeneous 3-D circuits, Microelectronics Journal, 50:C, (66-75), Online publication date: 1-Apr-2016. Souza J, Carro L, Rutzig M and Beck A A reconfigurable heterogeneous multicore with a homogeneous ISA Proceedings of the 2016 Conference on Design, Automation & Test in Europe, (1598-1603) Yao Y and Lu Z Memory-access aware DVFS for network-on-chip in CMPs Proceedings of the 2016 Conference on Design, Automation & Test in Europe, (1433-1436)Brock J and Bruce R (2016). Power labs, Journal of Computing Sciences in Colleges, 32:2, (104-110), Online publication date: 1-Dec-2016.Elkhouly R, El-Mahdy A and Elmasry A Optimality analysis of if-conversion transformation Proceedings of the 24th High Performance Computing Symposium, (1-8)Lee Y, Kim J, Jang H, Yang H, Kim J, Jeong J and Lee J (2015). A fully associative, tagless DRAM cache, ACM SIGARCH Computer Architecture News, 43:3S, (211-222), Online publication date: 4-Jan-2016.Masliah I, Abdelfattah A, Haidar A, Tomov S, Baboulin M, Falcou J and Lee J (2015). Dongarra J High-Performance Matrix-Matrix Multiplications of Very Small Matrices Proceedings of the 22nd International Conference on Euro-Par 2016: Parallel Processing - Volume 9833, (659-671)Catalán S, Malossi A, Bekas C and Quintana-Ortí E The Impact of Voltage-Frequency Scaling for the Matrix-Vector Product on the IBM POWER8 Proceedings of the 22nd International Conference on Euro-Par 2016: Parallel Processing - Volume 9833, (103-116)Dai Y, Fang Y, Yang L and Jeon G (2016). Graphics processing unit-accelerated joint-bitplane belief propagation algorithm in DSC, The Journal of Supercomputing, 72:6, (2351-2375), Online publication date: 1-Jun-2016. Kanev S, Darago J Hazelwood K, Ranganathan P, Moseley T, Wei G and Brooks D (2015). Profiling a warehouse-scale computer Architecture News, 43:3S, (158-169), Online publication date: 4-Jan-2016. Hong J and Kim S (2016). Flexible ECC Management for Low-Cost Transient Error Protection of Last-Level Caches, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 24:6, (2152-2164), Online publication date: 1-Jun-2016. Unal E and Savas E (2016). On Acceleration and Scalability of Number Theoretic Private Information Retrieval, IEEE Transactions on Parallel and Distributed Systems, 27:6, (1727-1741), Online publication date: 1-Jun-2016. Unal E and Savas E (2016). Toward a Parallel Turing Machine Model Network and Parallel Computing, (191-204)Shahvarani A and Jacobsen H A Hybrid B+-tree as Solution for In-Memory Indexing on CPU-GPU Heterogeneous Computing, (191-204)Shahvarani A and Jacobsen H A Hybrid B+-tree as Solution for In-Memory Indexing on CPU-GPU Heterogeneous Computing, (191-204)Shahvarani A and Jacobsen H A Hybrid B+-tree as Solution for In-Memory Indexing on CPU-GPU Heterogeneous Computing, (191-204)Shahvarani A and Jacobsen H A Hybrid B+-tree as Solution for In-Memory Indexing on CPU-GPU Heterogeneous Computing, (191-204)Shahvarani A and Jacobsen H A Hybrid B+-tree as Solution for In-Memory Indexing on CPU-GPU Heterogeneous Computing, (191-204)Shahvarani A and Jacobsen H A Hybrid B+-tree as Solution for In-Memory Indexing on CPU-GPU Heterogeneous Computing, (191-204)Shahvarani A and Jacobsen H A Hybrid B+-tree as Solution for In-Memory Indexing on CPU-GPU Heterogeneous Computing, (191-204)Shahvarani A and Jacobsen H A Hybrid B+-tree as Solution for In-Memory Indexing on CPU-GPU Heterogeneous Computing, (191-204)Shahvarani A and Jacobsen H A Hybrid B+-tree as Solution for In-Memory Indexing on CPU-GPU Heterogeneous Computing, (191-204)Shahvarani A and Jacobsen H A Hybrid B+-tree as Solution for In-Memory Indexing on CPU-GPU Heterogeneous Computing (191-204)Shahvarani A and Jacobsen H A Hybrid B+-tree as Solution for In-Memory Indexing (191-204)Shahvarani A and Jacobsen H A Hybrid B+-tree as Solution for In-Memory Indexing (191-204)Shahvarani A and Jacobsen H A Hybrid B+-tree as Solution for In-Memory Indexing (191-204)Shahvarani A and Jacobsen H A Hybrid B+-tree as Solution for In-Memory Indexing (191-204)Shahvarani A and Jacobsen H A Hybrid B+-tree as Solution for In-Memory Indexing (191-204)Shahvarani A and Jacobsen Indexing (191-204)Shahvara

automata algorithm for the deterministic simulation of 3-D multicellular tissue growth, Cluster Computing, 18:4, (1561-1579), Online publication date: 1-Dec-2015. Fox A and Patterson D (2015). Do-it-yourself textbook publishing, Communications of the ACM, 58:2, (40-43), Online publication date: 28-Jan-2015. Mozafari S, Meyer B and Skadron K Yield-aware Performance-Cost Characterization for Multi-Core SIMT Proceedings of the 25th edition on Great Lakes Symposium on VLSI, (237-240) Kandemir M, Zhao H, Tang X and Karakoy M Memory Row Reuse Distance and its Role in Optimizing Application Performance Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, (137-149)Kanev S, Darago J, Hazelwood K, Ranganathan P, Moseley T, Wei G and Brooks D Profiling a warehouse-scale computer Proceedings of the 42nd Annual International Symposium on Computer Architecture, (158-169)Zhang J, You S and Gruenwald L (2015). Large-scale spatial data processing on GPUs and GPU-accelerated clusters, SIGSPATIAL Special, 6:3, (27-34), Online publication date: 22-Apr-2015. Abadal S, Nemirovsky M, Alarcón E and Cabellos-Aparicio A Networking Challenges and Prospective Impact of Broadcast-Oriented Wireless Networks-on-Chip, (1-8), Online publication and Gupta P (2015). DPCS, ACM Transactions on Architecture and Code Optimization, 12:3, (1-26), Online publication date: 6-Oct-2015.Ul-Abdin Z and Svensson B Towards teaching embedded parallel computing Proceedings of the Workshop on Computer Architecture Education, (1-6)Kandemir M, Zhao H, Tang X and Karakoy M (2015). Memory Row Reuse Distance and its Role in Optimizing Application Performance, ACM SIGMETRICS Performance Evaluation Review, 43:1, (137-149), Online publication date: 24-Jun-2015. Jacobs M, Hahn S and Hack S WCET analysis for multi-core processors with shared buses and event-driven bus arbitration Proceedings of the 23rd International

Conference on Real Time and Networks Systems, (193-202)Zhang J, You S and Xia Y Prototyping A Web-based High-Performance Visual Analytics Platform for Origin-Destination Data Proceedings of the 1st International ACM SIGSPATIAL Workshop on Smart Cities and Urban Analytics, (16-23)Zhang J, You S and Gruenwald L Efficient Parallel Zonal Statistics on Large-Scale Global Biodiversity Data on GPUs Proceedings of the 4th International ACM SIGSPATIAL Workshop on Analytics for Big Geospatial Data, (35-44)Kiran D, Gurunarayanan S, Misra J and Nawal A (2015). Global scheduling heuristics for multicore architecture, Scientific Programming, 2015, (18-18), Online publication date: 1-Jan-2015. Damodaran P, Zaib A, Wallentowitz S, Wild T and Herkersdorf A Sharer status-based caching in tiled multiprocessor systems-on-chip Proceedings of the Symposium on High-Performance Parallel and Distributed Computing, (101-106)Cilku B, Kammerer R and Puschner P (2015). Aligning single path loops to reduce the number of capacity cache misses, ACM SIGBED Review, 12:1, (13-18), Online publication date: 27-Mar-2015.Liu Y and Sun X C2-bound Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, (1-11)Li W, Jin G, Cui X and See S An evaluation of unified memory technology on NVIDIA GPUs Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, (1092-1098)Eslami H, Kougkas A, Kotsifakou M, Kasampalis T, Feng K, Lu Y, Gropp W, Sun X, Chen Y and Thakur R Efficient diskto-disk sorting Proceedings of the 2015 International Workshop on Data-Intensive Scalable Computing Systems, (1-8) Cilku B and Puschner P (2015). Designing a time predictable memory hierarchy for single-path code, ACM SIGBED Review, 12:2, (16-21), Online publication date: 20-May-2015. Tan Z, Qian Z, Chen X, Asanovic K and Patterson D (2015). DIABLO, ACM SIGARCH Computer Architecture News, 43:1, (207-221), Online publication date: 29-May-2015. Tan Z, Qian Z, Chen X, Asanovic K and Patterson D DIABLO Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, (207-221) Gallenmüller S, Emmerich P, Wohlfart F, Raumer D and Carle G Comparison of Frameworks for High-Performance Packet IO Proceedings of the Eleventh ACM/IEEE Symposium on Architectures for networking and communications systems, (29-38)Fang Y, Hoang T, Becchi M and Chien A Fast support for unstructured data processing Proceedings of the 48th International Symposium on Microarchitecture, (533-545)Chaker H, Cudennec L, Dahmani S, Gogniat G and Sepúlveda M Cycle-based Model to Evaluate Consistency Protocols within a Multi-protocol Compilation Tool-chain Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores, (1-10) Altamimi M and Naik K A Computing Profiling Procedure for Mobile Developers to Estimate Energy Cost Proceedings of the 18th ACM International Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems, (301-305)Diaz I, Zhang C, Hollevoet L, Svensson J, Rodriques J, Wilhelmsson L, Olsson T, Van der Perre L and Öwall V (2015). A new digital front-end for flexible reception in software defined radio, Microprocessors & Microsystems, 39:8, (889-900), Online publication date: 1-Nov-2015. Zhu F, Yao Y, Tang W and Chen D (2015). A high performance framework for modeling and simulation of large-scale complex systems, Future Generation Computer Systems, 51:C. (132-141), Online publication date: 1-Oct-2015, Gadouleau M and Riis S (2015), Memoryless computation, Theoretical Computer Science, 562:C. (129-145), Online publication date: 11-Jan-2015, Carretero J. Distefano S.

State-of-the-Art in GPU-Based Large-Scale Volume Visualization, Computer Graphics Forum, 34:8, (13-37), Online publication date: 1-Dec-2015.Subedi T, Nguyen K and Cheriet M (2015). OpenFlow-based in-network Layer-2 adaptive multipath aggregation in data centers, Computer Communications, 61:C, (58-69), Online publication date: 1-May-2015.Lazarescu M and Lavagno L (2015), Interactive Trace-Based Analysis Toolset for Manual Parallelization of C Programs, ACM Transactions on Embedded Computing Systems, 14:1, (1-20), Online publication date: 21-Jan-2015. Hao Zhang, Gang Chen, Beng Chin Ooi, Kian-Lee Tan and Meihui Zhang (2015), In-Memory Big Data Management and Processing: A Survey, IEEE Transactions on Knowledge and Data Engineering, 27:7, (1920-1948), Online publication date: 1-Jul-2015. Sanchez E and Reorda M (2015). On the Functional Test of Branch Prediction Units, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 23:9, (1675-1688), Online publication date: 1-Sep-2015. Lai B, Kuan-Ting Chen and Ping-Ru Wu (2015). A High-Performance Double-Layer Counting Bloom Filter for Multicore Systems, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 23:11, (2473-2486), Online publication date: 1-Nov-2015.Li Wang, Minqi Zhou, Zhenjie Zhang, Minqi Zhou, Zhenjie Zhang, Minqi Zhou, (2015). NUMA-Aware Scalable and Efficient In-Memory Aggregation on Large Domains, IEEE Transactions on Knowledge and Data Engineering, 27:4, (1071-1084), Online publication date: 1-Apr-2015.Oxley M, Pasricha S, Khemka B, Ramirez A and Zou Y (2015). Makespan and Energy Robust Stochastic Static Resource Allocation of a Bag-of-Tasks to a Heterogeneous Computing System, IEEE Transactions on Parallel and Distributed Systems, 26:10, (2791-2805), Online publication date: 1-Oct-2015.Son Y, Seongil O, Yang H, Jung D, Ahn J, Kim J and Lee J Microbank Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, (1059-1070)Riemens D, Gaydadjiev G, Zeeuw C and Strydis C (2014). Towards scalable arithmetic units with graceful degradation, ACM Transactions on Embedded Computing Systems, 13:4, (1-26), Online publication date: 5-Dec-2014. Sahu A and Ramakrishna S Creating heterogeneity at run time by dynamic cache and bandwidth partitioning schemes Proceedings of the 29th Annual ACM Symposium on Applied Computing, (872-879)Patterson D (2014). How to build a bad research center, Communications of the ACM, 57:3, (33-36), Online publication date: 1-Mar-2014.Fang J, Sips H, Zhang L, Xu C, Che Y and Varbanescu A Test-driving Intel Xeon Phi Proceedings of the 5th ACM/SPEC international conference on Performance engineering, (137-148)Hrbacek R and Sekanina L Towards highly optimized cartesian genetic programming Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation, (1015-1022)Raghavendra K, Warrier T and Mutyam M SAMO Proceedings of the 11th ACM Conference on Computing Frontiers, (1-10)Mühlbauer T, Rödiger W, Seilbeck R, Kemper A and Neumann T Heterogeneity-conscious parallel query execution Proceedings of the Tenth International Workshop on Data Management on New Hardware, (1-10)Pirk H, Petraki E, Idreos S, Manegold S and Kersten M Database cracking Proceedings of the Tenth International Workshop on Data Management on New Hardware, (1-8)Piro G, Abadal S, Mestres A, Alarcón E, Solé-Pareta J, Grieco L and Boggia G Initial MAC Exploration for Graphene-enabled Wireless Networks-on-Chip Proceedings of ACM The First Annual International Conference on Nanoscale Computing and

Communication, (1-9)Segulja C and Abdelrahman T What is the cost of weak determinism? Proceedings of the 23rd international conference on Parallel architectures and compilation, (99-112)Yalcin G, Ergin O, Islek E, Unsal O and Cristal A (2014). Exploiting Existing Comparators for Fine-Grained Low-Cost Error Detection, ACM Transactions on Architecture and Code Optimization, 11:3, (1-24), Online publication date: 27-Oct-2014. Aziz A, Cireno M, Barros E and Prado B Balanced Prefetching Aggressiveness Controller for NoC-based Multiprocessor Proceedings of the 27th Symposium on Integrated Circuits and Systems Design, (1-7) Kaligirwa N, Leal E, Gruenwald L, Zhang J and You S

Petcu D, Pop D, Rauber T, Runger G and Singh D (2015). Energy-efficient Algorithms for Ultrascale Systems, Supercomputing Frontiers and Innovations: an International Journal, 2:2, (77-104), Online publication date: 6-Apr-2015. Beyer J, Hadwiger M and Pfister H (2015).

Parallel QuadTree encoding of large-scale raster geospatial data on multicore CPUs and GPGPUs Proceedings of the 3rd ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, (30-39)Tsoutsos N and Maniatakos M HEROIC Proceedings of the conference on Design, Automation & Test in Europe, (1-6)Shoukourian H, Wilde T, Auweter A and Bode A (2014). Predicting the Energy and Power Consumption of Strong and Weak Scaling HPC Applications, Supercomputing Frontiers and Innovations: an International Journal, 1:2, (20-41), Online publication date: 9-Jul-2014.Lazarescu M, Cohen A, Guatto A, Lê N, Lavagno L, Pop A, Prieto M, Terechko A and Sutii A Energy-aware parallelization flow and toolset for C code Proceedings of the 17th International Workshop on Software and Compilers for Embedded Systems, (79-88) Liu J. Bouganis C and Cheung P Image progressive acquisition for hardware systems Proceedings of the conference on Design, Automation & Test in Europe, (1-6) Valero M. Moreto M. Casas M. Ayguade E and Labarta J (2014). Runtime-Aware Architectures, Supercomputing Frontiers and Innovations: an International Journal, 1:1, (29-44), Online publication date: 6-Apr-2014. Titmus M, Gurtowski J and Schatz M (2014). Answering the demands of digital genomics, Concurrency and Computation: Practice & Experience, 26:4, (917-928), Online publication date: 25-Mar-2014. Bhattacharya A, Banerjee A and Sur-Kolay S Energy-Aware H.264 Decoding Proceedings of the 10th International Conference on Distributed Computing and International Conference on Distributed Computing Conference On Distributed Conference On Distributed Conference On Distributed Conference symposium on Field programmable gate arrays, (171-180)Li S, Ahn J, Strong R, Brockman J, Tullsen D and Jouppi N (2013). The McPAT Framework for Multicore and Manycore Architectures, ACM Transactions on Architectures and Code Optimization, 10:1, (1-29), Online publication date: 1-Apr-2013. Ltaief H, Luszczek P and Dongarra J (2013). Highperformance bidiagonal reduction using tile algorithms on homogeneous multicore architectures, ACM Transactions on Mathematical Software, 39:3, (1-22), Online publication date: 1-Apr-2013. Nanavati M, Spear M, Taylor N, Rajagopalan S, Meyer D, Aiello W and Warfield A Whose cache line is it anyway? Proceedings of the 8th ACM European Conference on Computer Systems, (141-154)Szymanski T Low latency energy efficient communications in global-scale cloud computing systems Proceedings of the 2013 workshop on Energy efficient adaptive-precision energy-efficient 3DIC multiplier Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI, (269-274)Cook H, Moreto M, Bird S, Dao K, Patterson D and Asanovic K A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness Proceedings of the 40th Annual International Symposium on Computer Architecture, (308-319)Son Y, Seongil O, Ro Y, Lee J and Ahn J Reducing memory access latency with asymmetric DRAM bank organizations Proceedings of the 40th Annual International Symposium on Computer Architecture, (380-391)Martínez H, Tárraga J, Medina I, Barrachina S, Castillo M, Dopazo J and Quintana-Ortí E A dynamic pipeline for RNA sequencing on multicore processors Proceedings of the 20th European MPI Users' Group Meeting, (235-240)Song X, Shi J, Chen H and Zang B Schedule processes, not VCPUs Proceedings of the 4th Asia-Pacific Workshop on Systems, (1-7)Cook H, Moreto M, Bird S, Dao K, Patterson D and Asanovic K (2013). A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness, ACM SIGARCH Computer Architecture News, 41:3, (308-319), Online publication date: 26-Jun-2013.Son Y, Seongil O, Ro Y, Lee J and Ahn J (2013). Reducing memory access latency with asymmetric DRAM bank organizations, ACM SIGARCH Computer Architecture News, 41:3, (380-391), Online publication date: 26-Jun-2013. Choi J, Kwak J, Jhang S and Jhon C Data filter cache with word selection cache for low power embedded processor Proceedings of the 2013 Research in Adaptive and Convergent Systems, (422-427)Xu T, Liljeberg P, Plosila J and Tenhunen H MMSoC Proceedings of the 14th International Conference on Computer Systems and Technologies, (67-74) Bhatia M, Kiran D, Misra J and Gurunarayanan S Fine grain thread scheduling on multicore processors Proceedings of the 6th ACM India Computing Convention, (1-6) Cicotti P, Carrington L and Chien A Toward application-specific memory reconfiguration for energy efficiency Proceedings of the 1st International Workshop on Energy Efficient Supercomputing, (1-8) Fauzia N, Elango V, Ravishankar M, Ramanujam J, Rastello F, Rountev A, Pouchet L and Sadayappan P (2013). Beyond reuse distance analysis, ACM Transactions on Architecture and Code Optimization, 10:4, (1-29), Online publication date: 1-Dec-2013. Bardizbanyan A, Själander M, Whalley D and Larsson-Edefors P (2013). Designing a practical data filter cache to improve both energy efficiency and performance, ACM Transactions on Architecture and Code Optimization, 10:4, (1-25), Online publication date: 1-Dec-2013. Hong S and Kim S AVICA Proceedings of the Conference on Design, Automation and Test in Europe, (65-70)Seo S, Lee J, Jo G and Lee J Automatic OpenCL work-group size selection for multicore CPUs Proceedings of the 22nd international conference on Parallel architectures and compilation to solve system of nonlinear equations, WIREs Computational Statistics, 5:5, (372-386), Online publication date: 1-Sep-2013. Schindewolf M, Rocker B, Karl W and Heuveline V Evaluation of two formulations of the conjugate gradients method with transactional memory Proceedings of the 19th international conference on Parallel Processing, (508-520) Soliman M (2013). Design, implementation, and evaluation of a low-complexity vector-core for executing scalar/vector instructions, Journal of Parallel and Distributed Computing, 73:6, (836-850), Online publication date: 1-Jun-2013. Altinigneli M, Plant C and Böhm C Massively parallel expectation maximization using graphics processing units Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, (838-846)Benner P, Ezzatti P, Quintana-Ortí E and Remón A On the Impact of Optimization on the Time-Power-Energy Balance of Dense Linear Algebra Factorizations Algorithms and Architectures for Parallel Processing, (3-10)Ahn J, Jouppi N, Kozyrakis C, Leverich J and Schreiber R (2012). Improving System Energy Efficiency with Memory Rank Subsetting, ACM Transactions on Architecture and Code Optimization, 9:1, (1-28), Online publication date: 1-Mar-2012. Edwards J and Vishkin U Brief announcement Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures, (190-192)Wang Y, Zhang C, Yu H and Zhang W Design of low power 3D hybrid memory by non-volatile CBRAM-crossbar with block-level data-retention Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design, (197-202)Zhang J, Kamga C, Gong H and Gruenwald L U2SOD-DB Proceedings of the ACM SIGKDD International Workshop on Urban Computing, (163-171)Tu C, Hung S and Tsai T (2012). MCEmu, ACM Transactions on Design Automation of Electronic Systems, 17:4, (1-25), Online publication date: 1-Oct-2012.Menon J, De Kruijf M and Sankaralingam K (2012). iGPU, ACM SIGARCH Computer Architecture News, 40:3, (72-83), Online publication date: 5-Sep-2012. Zhang J, You S and Gruenwald L High-performance online spatial and temporal aggregations on multi-core CPUs and many-core GPUs and many-core GP Gruenwald L U2STRA Proceedings of the 2012 ACM workshop on City data management workshop, (5-12)Park H and Choi K Position-based weighted round-robin arbitration for equality of service in many-core network-on-chips Proceedings of the Fifth International Workshop on Network on Chip Architectures, (51-56)Zhang J and You S CudaGIS Proceedings of the 3rd ACM SIGSPATIAL International Workshop on GeoStreaming, (101-108)Menon J. De Kruijf M and Sankaralingam K iGPU Proceedings of the 39th Annual International Symposium on Computer Architecture, (72-83)Hart S, Frachtenberg E and Berezecki M Predicting memcached throughput using simulation and modeling Proceedings of the 2012 Symposium on Theory of Modeling and Simulation - DEVS Integrative M&S Symposium, (1-8)Habermaier A and Knapp A On the correctness of the SIMT execution model of GPUs Proceedings of the 21st European conference on Programming Languages and Systems, (316-335)Nie P and Duan Z (2012). Efficient and scalable scheduling for performance heterogeneous multicore systems, Journal of Parallel and Distributed Computing, 72:3, (353-361), Online publication date: 1-Mar-2012. Haque M, Ragel R, Ambrose A, Radhakrishnan S and Parameswaran S DIMSim Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, (151-160)Bournoutian G and Orailoglu A Dynamic transient fault detection and recovery for embedded processor datapaths Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, (43-52)Yang S, Dong C, Xiao Y, Cheng Y, Shi Z, Li Z and Sun L (2023). Asteria-Pro: Enhancing Deep-Learning Based Binary Code Similarity Detection by Incorporating Domain Knowledge, ACM Transactions on Software Engineering and Methodology, 0:0 Contributors Printed Text1. Fundamentals of Quantitative Design and Analysis 2. Memory Hierarchy Design 3. Instruction-Level Parallelism in Vector, SIMD, and GPU Architectures 5. Multiprocessors and Thread-Level Parallelism in Vector, SIMD, and GPU Architectures 5. Multiprocessors and Thread-Level Parallelism 6. The Warehouse-Scale Computer 7. Domain Specific Architectures 5. Multiprocessors and Thread-Level Parallelism 6. The Warehouse-Scale Computer 7. Domain Specific Architectures 5. Multiprocessors and Thread-Level Parallelism 6. The Warehouse-Scale Computer 7. Domain Specific Architectures 5. Multiprocessors 6. Multiprocessors 6. Multiprocessors 6. Multiprocessors 7. Multiprocessors 8. Multip

of Memory HierarchyC. Pipelining: Basic and Intermediate Concepts OnlineD. Storage SystemsE.

Embedded SystemsF. Interconnection NetworksG. Vector ProcessorsH. Hardware and Software for VLIW and EPICI. Large-Scale Multiprocessors and References