- This glossary describes computer floating-point arithmetic
terms. It also describes terms and acronyms associated with parallel
Note - This symbol,
"||" appended to a term
designates it as associated with parallel processing.
- Accuracy is a measure of the extent to which a result is
affected by error. Contrast with precision. For example, "the
result is accurate to six decimal places" implies that all the
errors incurred while calculating the result are not large enough to
change the sixth decimal place of the result.
- A number of processors working simultaneously, each handling
one element of the array, so that a single operation can apply to all
elements of the array in parallel.
- See cache, direct mapped cache,
fully associative cache, set associative cache.
- Computer control behavior where a specific operation is
begun upon receipt of an indication (signal) that a particular event
has occurred. Asynchronous control relies on synchronization mechanisms
called locks to coordinate processors. See mutual exclusion,
mutex lock, semaphore lock, single-lock
strategy, spin lock.
- See MBus, multiprocessor bus,
- A synchronization mechanism for coordinating tasks even when
data accesses are not involved. A barrier is analogous to a gate.
Processors or threads operating in parallel reach the gate at different
times, but none can pass through until all processors reach the gate.
For example, suppose at the end of each day, all bank tellers are
required to tally the amount of money that was deposited, and the
amount that was withdrawn. These totals are then reported to the bank
vice president, who must check the grand totals to verify debits equal
credits. The tellers operate at their own speeds; that is, they finish
totaling their transactions at different times. The barrier mechanism
prevents tellers from leaving for home before the grand total is
checked. If debits do not equal credits, all tellers must return to
their desks to find the error. The barrier is removed after the vice
president obtains a satisfactory grand total.
- The sum of the base-2 exponent and a constant (bias) chosen
to make the stored exponent's range non-negative. For example, the
exponent of 2-100 is stored in IEEE single precision format
as (-100) + (single precision bias of 127) = 27.
- The interval between any two consecutive powers of two.
- A thread is waiting for a resource or data; such as, return
data from a pending disk read, or waiting for another thread to unlock
- For Solaris threads, a thread permanently assigned to a
particular LWP is called a bound thread. Bound threads can be scheduled
on a real-time basis in strict priority with respect to all other
active threads in the system, not only within a process. An LWP is an
entity that can be scheduled with the same default scheduling priority
as any UNIX process.
Cache for the SuperSPARC processor is organized into the following
- Small, fast, hardware-controlled memory that acts as a
buffer between a processor and main memory. Cache contains a copy of
the most recently used memory locations--addresses and
contents--of instructions and data. Every address reference
goes first to cache. If the desired instruction or data is not in
cache, a cache miss occurs. The contents are fetched across the bus
from main memory into the CPU register specified in the instruction
being executed and a copy is also written to cache. It is likely that
the same location will be used again soon, and, if so, the address is
found in cache, resulting in a cache hit. If a write to that address
occurs, the hardware not only writes to cache, but can also generate a
write-through to main memory.
Tables 1, 2, and 3 summarize the characteristics of the SuperSPARC
See associativity, circuit switching, direct
mapped cache, fully associative cache, MBus,
packet switching, set associative cache,
write-back, write-through, XDBus.
Temporal locality (locality in time) is the tendency to reuse recently
accessed items. For example, most programs contain loops, so that
instructions and data are likely to be accessed repeatedly. Temporal
locality retains recently accessed items closer to the processor in
cache rather than requiring a memory access. See cache,
competitive-caching, false sharing,
- A program does not access all of its code or data at once
with equal probability. Having recently accessed information in cache
increases the probability of finding information locally without having
to access memory. The principle of locality states that programs access
a relatively small portion of their address space at any instant of
time. There are two different types of locality: temporal and
Spatial locality (locality in space) is the tendency to reference items
whose addresses are close to other recently accessed items. For
example, accesses to elements of an array or record show a natural
spatial locality. Caching takes advantage of spatial locality by moving
blocks (multiple contiguous words) from memory into cache and closer to
the processor. See cache, competitive-caching, false
sharing, write-invalidate, write-update.
- A hardware feature of some pipeline architectures that
allows the result of an operation to be used immediately as an operand
for a second operation, simultaneously with the writing of the result
to its destination register. The total cycle time of two chained
operations is less than the sum of the stand-alone cycle times for the
instructions. For example, the TI 8847 supports chaining of consecutive
fadd, fsub, and fmul (of the same
precision). Chained faddd/fmuld requires 12
cycles, while consecutive unchained faddd/fmuld requires 17
- A mechanism for caches to communicate with each other as
well as with main memory. A dedicated connection (circuit) is
established between caches or between cache and main memory. While a
circuit is in place no other traffic can travel over the bus.
- In systems with multiple caches, the mechanism that ensures
that all processors see the same image of memory at all times.
- The three floating point exceptions overflow, invalid, and
division are collectively referred to as the common exceptions for the
purposes of ieee_flags(3m) and ieee_handler(3m).
They are called common exceptions because they are commonly trapped as
- Competitive-caching maintains cache coherence by using a
hybrid of write-invalidate and write-update. Competitive-caching uses a
counter to age shared data. Shared data is purged from cache based on a
least-recently-used (LRU) algorithm. This can cause shared data to
become private data again, thus eliminating the need for the cache
coherency protocol to access memory (via backplane bandwidth) to keep
multiple copies synchronized. See cache, cache locality, false
sharing, write-invalidate, write-update.
- The execution of two or more active threads or processes in
parallel. On a uniprocessor apparent concurrence is accomplished by
rapidly switching between threads. On a multiprocessor system true
parallel execution can be achieved. See asynchronous control,
multiprocessor system, thread.
- Processes that execute in parallel in multiple processors or
asynchronously on a single processor. Concurrent processes can interact
with each other, and one process can suspend execution pending receipt
of information from another process or the occurrence of an external
event. See process, sequential processes.
- For Solaris threads, a condition variable enables threads to
atomically block until a condition is satisfied. The condition is
tested under the protection of a mutex lock. When the condition is
false, a thread blocks on a condition variable and atomically releases
the mutex waiting for the condition to change. When another thread
changes the condition, it can signal the associated condition variable
to cause one or more waiting threads to wake up, reacquire the mutex,
and re-evaluate the condition. Condition variables can be used to
synchronize threads in this process and other processes if the variable
is allocated in memory that is writable and shared among the
cooperating processes and have been initialized for this behavior.
control flow model||
- In multitasking operating systems, such as the SunOS operating
system, processes run for a fixed time quantum. At the end of the time
quantum, the CPU receives a signal from the timer, interrupts the currently
running process, and prepares to run a new process. The CPU saves the
registers for the old process, and then loads the registers for the new
process. Switching from the old process state to the new is known as a
context switch. Time spent switching contexts is system overhead; the time
required depends on the number of registers, and on whether there are
special instructions to save the registers associated with a process.
- The von Neumann model of a computer. This model specifies
flow of control; that is, which instruction is executed at each step of
a program. All Sun workstations are instances of the von Neumann model.
See data flow model, demand-driven dataflow.
- An indivisible section of code that can only be executed by
one thread at a time and is not interruptible by other threads; such
as, code that accesses a shared variable. See mutual exclusion,
mutex lock, semaphore lock, single-lock strategy, spin lock.
data flow model||
- A resource that can only be in use by at most one thread at
any given time. Where several asynchronous threads are required to
coordinate their access to a critical resource, they do so by
synchronization mechanisms. See mutual exclusion, mutex lock,
semaphore lock, single-lock strategy, spin lock.
- This computer model specifies what happens to data, and
ignores instruction order. That is, computations move forward by nature
of availability of data values instead of the availability of
instructions. See control flow model, demand-driven
- In multithreading, a situation where two or more threads
simultaneously access a shared resource. The results are indeterminate
depending on the order in which the threads accessed the resource. This
situation, called a data race, can produce different results when a
program is run repeatedly with the same input. See mutual
exclusion, mutex lock, semaphore lock, single-lock strategy, spin
- A situation that can arise when two (or more) separately
active processes compete for resources. Suppose that process P requires
resources X and Y and requests their use in that order at the same time
that process Q requires resources Y and X and asks for them in that
order. If process P has acquired resource X and simultaneously process
Q has acquired resource Y, then neither process can
proceed--each process requires a resource that has been
allocated to the other process.
- The value that is delivered as the result of a
floating-point operation that caused an exception.
- A task is enabled for execution by a processor when its
results are required by another task that is also enabled; such as, a
graph reduction model. A graph reduction program consists of reducible
expressions that are replaced by their computed values as the
computation progresses through time. Most of the time, the reductions
are done in parallel--nothing prevents parallel reductions
except the availability of data from previous reductions. See
control flow model, data flow model.
direct mapped cache||
- Older nomenclature for subnormal number.
distributed memory architecture||
- A direct mapped cache is a one-way set associative cache.
That is, each cache entry holds one block and forms a single
set with one element. See cache, cache locality, false
sharing, fully associative cache, set associative cache,
- A combination of local memory and processors at each node of
the interconnect network topology. Each processor can directly access
only a portion of the total memory of the system. Message passing is
used to communicate between any two processors, and there is no global,
shared memory. Therefore, when a data structure must be shared, the
program issues send/receive messages to the process that owns that
structure. See interprocess communication, message
- Using two words to represent a number in order to keep or
increase precision. On SPARC workstations, double precision is the
64-bit IEEE double precision.
- An arithmetic exception arises when an attempted atomic
arithmetic operation has no result that is acceptable universally. The
meanings of atomic and acceptable vary with time and place.
- The component of a floating-point number that signifies the
integer power to which the base is raised in determining the value of
the represented number.
floating-point number system
- A condition that occurs in cache when two unrelated data
accessed independently by two threads reside in the same block. This
block can end up 'ping-ponging' between caches for no valid
reason. Recognizing such a case and rearranging the data structure to
eliminate the false sharing greatly increases cache performance. See
cache, cache locality.
fully associative cache||
- A system for representing a subset of real numbers in which
the spacing between representable numbers is not a fixed, absolute
constant. Such a system is characterized by a base, a sign, a
significand, and an exponent (usually biased). The value of the number
is the signed product of its significand and the base raised to the
power of the unbiased exponent.
- A fully associative cache with m entries is an
m-way set associative cache. That is, it has a single
set with m blocks. A cache entry can reside in any of the
m blocks within that set. See cache, cache locality,
direct mapped cache, false sharing, set associative cache,
- When a floating-point operation underflows, return a
subnormal number instead of 0. This method of handling underflow
minimizes the loss of accuracy in floating-point calculations on small
IEEE Standard 754
- Extra bits used by hardware to ensure correct rounding, not
accessible by software. For example, IEEE double precision operations
use three hidden bits to compute a 56-bit result that is then rounded
to 53 bits.
- The standard for binary floating-point arithmetic developed
by the Institute of Electrical and Electronics Engineers, published in
interconnection network topology||
- A fragment of assembly language code that is substituted for
the function call it defines, during the inlining pass of ProCompilers.
Used (for example) by the math library in in-line template files
(libm.il) in order to access hardware implementations of
trigonometric functions and other elementary functions from C
- Interconnection topology describes how the processors are
connected. All networks consist of switches whose links go to
processor-memory nodes and to other switches. There are four generic
forms of topology: star, ring, bus, and fully-connected network. Star
topology consists of a single hub processor with the other processors
directly connected to the single hub, the non-hub processors are not
directly connected to each other. In ring topology all processors are
on a ring and communication is generally in one direction around the
ring. Bus topology is noncyclic, with all nodes connected;
consequently, traffic travels in both directions, and some form of
arbitration is needed to determine which processor can use the bus at
any particular time. In a fully-connected (crossbar) network, every
processor has a bidirectional link to every other
Commercially-available parallel processors use
multistage network topologies. A multistage network topology is
characterized by 2-dimensional grid, and boolean n-cube.
- Message passing among active processes. See circuit
switching, distributed memory architecture, MBus, message passing,
packet switching, shared memory, XDBus.
- See interprocess communication.
- Solaris threads are implemented as a user-level library,
using the kernel's threads of control, that are called light-weight
processes (LWPs). In Solaris 2.2 and above, a process is a collection
of LWPs that share memory. Each LWP has the scheduling priority of a
UNIX process and shares the resources of that process. LWPs coordinate
their access to the shared memory by using synchronization mechanisms
such as locks. An LWP can be thought of as a virtual CPU that executes
code or system calls. The threads library schedules threads on a pool
of LWPs in the process, in much the same way as the kernel schedules
LWPs on a pool of processors. Each LWP is independently dispatched by
the kernel, performs independent system calls, incurs independent page
faults, and runs in parallel on a multiprocessor system. The LWPs are
scheduled by the kernel onto the available CPU resources according to
their scheduling class and priority.
- A mechanism for enforcing a policy for serializing access to
shared data. A thread or process uses a particular lock in order to
gain access to shared memory protected by that lock. The locking and
unlocking of data is voluntary in the sense that only the programmer
knows what must be locked. See data race, mutual exclusion, mutex
lock, semaphore lock, single-lock strategy, spin lock.
- See light-weight process.
- MBus is a bus specification for a processor/memory/IO
interconnect. It is licensed by SPARC International to several silicon
vendors who produce interoperating CPU modules, IO interfaces and
memory controllers. MBus is a circuit-switched protocol combining read
requests and response on a single bus. MBus level I defines
uniprocessor signals; MBus level II defines multiprocessor extensions
for the write-invalidate cache coherence mechanism.
- A medium that can retain information for subsequent
retrieval. The term is most frequently used for referring to a
computer's internal storage that can be directly addressed by
machine instructions. See cache, distributed memory, shared
- In the distributed memory architecture, a mechanism for
processes to communicate with each other. There is no shared data
structure in which they deposit messages. Message passing allows a
process to send data to another process and for the intended recipient
to synchronize with the arrival of the data.
- See Multiple Instruction Multiple Data, shared
Multiple Instruction Multiple Data||
- In Solaris 2.2 and above, function calls inside libraries
are either mt-safe or not mt-safe; mt-safe code is also called
"re-entrant" code. That is, several threads can simultaneously
call a given function in a module and it is up to the function code to
handle this. The assumption is that data shared between threads is only
accessed by module functions. If mutable global data is available to
clients of a module, appropriate locks must also be made visible in the
interface. Furthermore, the module function cannot be made re-entrant
unless the clients are assumed to use the locks consistently and at
appropriate times. See single-lock strategy.
multiple read single write||
- System model where many processors can be simultaneously
executing different instructions on different data. Furthermore, these
processors operate in a largely autonomous manner as if they are
separate computers. They have no central controller, and they typically
do not operate in lock-step fashion. Most real world banks run this
way. Tellers do not consult with one another, nor do they perform each
step of every transaction at the same time. Instead, they work on their
own, until a data access conflict occurs. Processing of transactions
occurs without concern for timing or customer order. But customers A
and B must be explicitly prevented from simultaneously accessing the
joint AB account balance. MIMD relies on synchronization mechanisms
called locks to coordinate access to shared resources. See mutual
exclusion, mutex lock, semaphore lock, single-lock strategy, spin
- In a concurrent environment, the first process to access
data for writing has exclusive access to it, making concurrent write
access or simultaneous read and write access impossible. However, the
data can be read by multiple readers.
- See multiprocessor system.
- In a shared memory multiprocessor machine each CPU and cache
module are connected together via a bus that also includes memory and
IO connections. The bus enforces a cache coherency protocol. See
cache, coherence, Mbus, XDBus.
- A system in which more than one processor can be active at
any given time. While the processors are actively executing separate
processes, they run completely asynchronously. However, synchronization
between processors is essential when they access critical system
resources or critical regions of system code. See critical region,
critical resource, multithreading, uniprocessor system.
- In a uniprocessor system, a large number of threads appear
to be running in parallel. This is accomplished by rapidly switching
- Applications that can have more than one thread or processor
active at one time. Multithreaded applications can run in both
uniprocessor systems and multiprocessor systems. See bound thread,
mt-safe, single-lock strategy, thread, unbound thread,
- Synchronization variable to implement the mutual exclusion
mechanism. See condition variable, mutual exclusion.
- In a concurrent environment, the ability of a thread to
update a critical resource without accesses from competing threads. See
critical region, critical resource.
- Stands for Not a Number. A symbolic entity that is encoded
in floating-point format.
- In IEEE arithmetic, a number with a biased exponent that is
neither zero nor maximal (all 1's), representing a subset of the
normal range of real numbers with a bounded small relative error.
- In the shared memory architecture, a mechanism for caches to
communicate with each other as well as with main memory. In packet
switching, traffic is divided into small segments called packets that
are multiplexed onto the bus. A packet carries identification that
enables cache and memory hardware to determine whether the packet is
destined for it or to send the packet on to its ultimate destination.
Packet switching allows bus traffic to be multiplexed and unordered
(not sequenced) packets to be put on the bus. The unordered packets are
reassembled at the destination (cache or main memory). See cache,
- A model of the world that is used to formulate a computer
solution to a problem. Paradigms provide a context in which to
understand and solve a real-world problem. Because a paradigm is a
model, it abstracts the details of the problem from the reality, and in
doing so, makes the problem easier to solve. Like all abstractions,
however, the model can be inaccurate because it only approximates the
real world. See Multiple Instruction Multiple Data, Single
Instruction Multiple Data, Single Instruction Single Data, Single
Program Multiple Data.
- In a multiprocessor system, true parallel execution is
achieved where a large number of threads or processes can be active at
one time. See concurrence, multiprocessor system, multithreading,
- See concurrent processes, multithreading.
- If the total function applied to the data can be divided
into distinct processing phases, different portions of data can flow
along from phase to phase; such as a compiler with phases for lexical
analysis, parsing, type checking, code generation and so on. As soon as
the first program or module has passed the lexical analysis phase, it
can be passed on to the parsing phase while the lexical analyzer starts
on the second program or module. See array processing, vector
- A hardware feature where operations are reduced to multiple
stages, each of which takes (typically) one cycle to complete. The
pipeline is filled when new operations can be issued each cycle. If
there are no dependencies among instructions in the pipe, new results
can be delivered each cycle. Chaining implies pipelining of dependent
instructions. If dependent instructions cannot be chained, when the
hardware does not support chaining of those particular instructions,
then the pipeline stalls.
- A quantitative measure of the density of representable
numbers. "IEEE double precision format specifies 53 bits of
precision" implies that the relative representation error in the
normal range is bounded by 2-52.
- A unit of activity characterized by a single sequential
thread of execution, a current state, and an associated set of system
- A NaN (not a number) that propagates through almost every
arithmetic operation without raising new exceptions.
- The base number of any system of numbers. For example, 2 is
the radix of a binary system, and 10 is the radix of the decimal system
of numeration. SPARC workstations use radix-2 arithmetic; IEEE Std 754
is a radix-2 arithmetic standard.
- Inexact results must be rounded up or down to obtain
representable values. When a result is rounded up, it is increased to
the next representable value. When rounded down, it is reduced to the
preceding representable value.
- The error introduced when a real number is rounded to a
machine-representable number. Most floating-point calculations incur
roundoff error. For any one floating-point operation, IEEE Std 754
specifies that the result shall not incur more than one rounding
- Synchronization mechanism for controlling access to critical
resources by cooperating asynchronous threads. See
- A special-purpose data type introduced by E. W. Dijkstra
that coordinates access to a particular resource or set of shared
resources. A semaphore has an integer value (that cannot become
negative) with two operations allowed on it. The
signal (V or up) operation increases the value by
one, and in general indicates that a resource has become free. The wait
(P or down) operation decreases the value by one,
when that can be done without the value going negative, and in general
indicates that a free resource is about to start being used. See
set associative cache||
- Processes that execute in such a manner that one must finish
before the next begins. See concurrent processes,
shared memory architecture||
- In a set associative cache, there are a fixed number of
locations (at least two) where each block can be placed. A set
associative cache with n locations for a block is called an
n-way set associative cache. An n-way set associative
cache consists of more than one set, each of which consists of
n blocks. A block can be placed in any location (element) of
that set. Increasing the associativity level (number of blocks in a
set) increases the cache hit rate. See cache, cache locality, false
sharing, write-invalidate, write-update.
- In a bus-connected multiprocessor system, processes or
threads communicate through a global memory shared by all processors.
This shared data segment is placed in the address space of the
cooperating processes between their private data and stack segments.
Subsequent tasks spawned by fork() copy all but the shared
data segment in their address space. Shared memory requires program
language extensions and library routines to support the model.
- A NaN (not a number) that raises the invalid operation
exception whenever it appears as an operand.
- The component of a floating-point number that is multiplied
by a signed power of the base to determine the value of the number. In
a normalized number, the significand consists of a single nonzero digit
to the left of the radix point and a fraction to the right.
Single Instruction Multiple Data||
- See Single Instruction Multiple Data.
Single Instruction Single Data||
- System model where there are many processing elements, but
they are designed to execute the same instruction at the same time;
that is, one program counter is used to sequence through a single copy
of the program. SIMD is especially useful for solving problems that
have lots of data that needs to be updated on a wholesale basis; such
as numerical calculations that are regular. Many scientific and
engineering applications (such as, image processing, particle
simulation, and finite element methods) naturally fall into the SIMD
paradigm. See array processing, pipeline, vector
- The conventional uniprocessor model, with a single processor
fetching and executing a sequence of instructions that operate on the
data items specified within them. This is the original von Neumann
model of the operation of a computer.
Single Program Multiple Data||
- Using one computer word to represent a number.
- A form of asynchronous parallelism where simultaneous
processing of different data occurs without lock-step coordination. In
SPMD, processors can execute different instructions at the same time;
such as, different branches of an
- In the single-lock strategy, a thread acquires a single,
application-wide mutex lock whenever any thread in the application is
running and releases the lock before the thread blocks. The single-lock
strategy requires cooperation from all modules and libraries in the
system to synchronize on the single lock. Because only one thread can
be accessing shared data at any given time, each thread has a
consistent view of memory. This strategy is quite effective in a
uniprocessor, provided shared memory is put into a consistent state
before the lock is released and that the lock is released often enough
to allow other threads to run. Furthermore, in uniprocessor systems,
concurrency is diminished if the lock is not dropped during most I/O
operations. The single-lock strategy cannot be applied in a
- See Single Instruction Single Data.
- The most popular protocol for maintaining cache coherency is
called snooping. Cache controllers monitor or snoop on the bus to
determine whether or not the cache contains a copy of a shared block.
For reads, multiple copies can reside in the cache of different
processors, but because the processors need the most recent copy, all
processors must get new values after a write. See cache,
competitive-caching, false sharing, write-invalidate,
For writes, a processor must have exclusive
access to write to cache. Writes to unshared blocks do not cause bus
traffic. The consequence of a write to shared data is either to
invalidate all other copies or to update the shared copies with the
value being written. See cache, competitive-caching, false sharing,
- Threads use a spin lock to test a lock variable over and
over until some other task releases the lock. That is, the waiting
thread spins on the lock until the lock is cleared. Then, the waiting
thread sets the lock while inside the critical region. After work in
the critical region is complete, the thread clears the spin lock so
another thread can enter the critical region. The difference between a
spin lock and a mutex is that an attempt to get a mutex held by someone
else will block and release the LWP; a spin lock does not release the
LWP. See mutex lock.
- See Single Program Multiple Data.
- Standard Error is the Unix file pointer to standard error
output. This file is opened when a program is started.
- Flushing the underflowed result of an arithmetic operation
- In IEEE arithmetic, a nonzero floating point number with a
biased exponent of zero. The subnormal numbers are those between zero
and the smallest normal number.
- A flow of control within a single UNIX process address
space. Solaris threads provide a light-weight form of concurrent task,
allowing multiple threads of control in a common user-address space,
with minimal scheduling and communication overhead. Threads share the
same address space, file descriptors (when one thread opens a file, the
other threads can read it), data structures, and operating system
state. A thread has a program counter and a stack to keep track of
local variables and return addresses. Threads interact through the use
of shared data and thread synchronization operations. See bound
thread, light-weight processes, multithreading, unbound
- See interconnection network topology.
- The radix complement of a binary numeral, formed by
subtracting each digit from 1, then adding 1 to the least significant
digit and executing any required carries. For example, the two's
complement of 1101 is 0011.
- Stands for unit in last place. In binary formats, the least
significant bit of the significand, bit 0, is the unit in the last
- Stands for ulp of x truncated in working
- For Solaris threads, threads scheduled onto a pool of LWPs
are called unbound threads. The threads library invokes and assigns
LWPs to execute runnable threads. If the thread becomes blocked on a
synchronization mechanism (such as a mutex lock) the state of the
thread is saved in process memory. The threads library then assigns
another thread to the LWP. See bound thread, multithreading,
- A condition that occurs when the result of a floating-point
arithmetic operation is so small that it cannot be represented as a
normal number in the destination floating-point format with only normal
- A uniprocessor system has only one processor active at any
given time. This single processor can run multithreaded applications as
well as the conventional single instruction single data model. See
multithreading, single instruction single data, single-lock
- Processing of sequences of data in a uniform manner, a
common occurrence in manipulation of matrices (whose elements are
vectors) or other arrays of data. This orderly progression of data can
capitalize on the use of pipeline processing. See array processing,
- An ordered set of characters that are stored, addressed,
transmitted and operated on as a single entity within a given computer.
In the context of SPARC workstations, a word is 32 bits.
- In IEEE arithmetic, a number created from a value that
otherwise overflows or underflows by adding a fixed offset to its
exponent to position the wrapped value in the normal number range.
Wrapped results are not currently produced on SPARC workstations.
- Write policy for maintaining coherency between cache and
main memory. Write-back (also called copy back or store in) writes only
to the block in local cache. Writes occur at the speed of cache memory.
The modified cache block is written to main memory only when the
corresponding memory address is referenced by another processor. The
processor can write within a cache block multiple times and writes it
to main memory only when referenced. Because every write does not go to
memory, write-back reduces demands on bus bandwidth. See cache,
- Maintains cache coherence by reading from local caches until
a write occurs. To change the value of a variable the writing processor
first invalidates all copies in other caches. The writing processor is
then free to update its local copy until another processor asks for the
variable. The writing processor issues an invalidation signal over the
bus and all caches check to see if they have a copy; if so, they must
invalidate the block containing the word. This scheme allows multiple
readers, but only a single writer. Write-invalidate use the bus only on
the first write to invalidate the other copies; subsequent local writes
do not result in bus traffic, thus reducing demands on bus bandwidth.
See cache, cache locality, coherence, false sharing,
- Write policy for maintaining coherency between cache and
main memory. Write-through (also called store through) writes to main
memory as well as to the block in local cache. Write-through has the
advantage that main memory has the most current copy of the data. See
cache, coherence, write-back.
- Write-update, also known as write-broadcast, maintains cache
coherence by immediately updating all copies of a shared variable in
all caches. This is a form of write-through because all writes go over
the bus to update copies of shared data. Write-update has the advantage
of making new values appear in cache sooner, which can reduce latency.
See cache, cache locality, coherence, false sharing,
- The XDBus specification uses low-impedance GTL (Gunning
Transceiver Logic) transceiver signalling to drive longer backplanes at
higher clock rates. XDBus supports a large number of CPUs with multiple
interleaved memory banks for increased throughput. XDBus uses a packet
switched protocol with split requests and responses for more efficient
bus utilization. XDBus also defines an interleaving scheme so that one,
two or four separate bus data paths can be used as a single backplane
for increased throughput. XDBus supports write-invalidate, write-update
and competitive-caching coherency schemes, and has several congestion
control mechanisms. See cache, coherence, competitive-caching,